SQL SUM with Repeating Sub Entries - Best Practice? - sql

I hit this issue regularly but here is an example....
I have a Order and Delivery Tables. Each order can have one to many Deliveries.
I need to report totals based on the Order Table but also show deliveries line by line.
I can write the SQL and associated Access Report for this with ease ....
SELECT xxx
FROM
Order
LEFT OUTER JOIN
Delivery on Delivery.OrderNO = Order.OrderNo
until I get to the summing element. I obviously only want to sum each Order once, not the 1-many times there are deliveries for that order.
e.g. The SQL might return the following based on 2 Orders (ignore the banalness of the report, this is very much simplified)
Region OrderNo Value Delivery Date
North 1 £100 12-04-2012
North 1 £100 14-04-2012
North 2 £73 01-05-2012
North 2 £73 03-05-2012
North 2 £73 07-05-2012
South 3 £50 23-04-2012
I would want to report:
Total Sales North - £173
Delivery 12-04-2012
Delivery 14-04-2012
Delivery 01-05-2012
Delivery 03-05-2012
Delivery 07-05-2012
Total Sales South - £50
Delivery 23-04-2012
The bit I'm referring to is the calculation of the £173 and £50 which the first of which obviously shouldn't be £419!
In the past I've used things like MAX (for a given Order) but that seems like a fudge.
Surely there must be a regular answer to this seemingly common problem but I can't find one.
I don't necessarily need the code - just a helpful point in the right direction.
Many thanks,
Chris.

A roll up operator may not look pretty. However, it would do the regular aggregates that you see now, and it show the subtotals of the order. This is what you're looking for.
SELECT xxx
FROM
Order
LEFT OUTER JOIN
Delivery on Delivery.OrderNO = Order.OrderNo
GROUP BY xxx
WITH ROLLUP;
I'm not exactly sure how the rest of your query is set up, but it would look something like this:
Region OrderNo Value Delivery Date
North 1 £100 12-04-2012
North 1 £100 14-04-2012
North 2 £73 01-05-2012
North 2 £73 03-05-2012
North 2 £73 07-05-2012
NULL NULL f419 NULL

I believe what you want is called a windowing function for your aggregate operation. It looks like the following:
SELECT xxx, SUM(Value) OVER (PARTITION BY Order.Region) as OrderTotal
FROM
Order
LEFT OUTER JOIN
Delivery on Delivery.OrderNO = Order.OrderNo
Here's the MSDN article. The PARTITION BY tells the SUM to be done separately for each distinct Order.Region.
Edit: I just noticed that I missed what you said about orders being counted multiple times. One thing you could do is SUM() the values before joining, as a CTE (guessing at your schema a bit):
WITH RegionOrders AS (
SELECT Region, OrderNo, SUM(Value) OVER (PARTITION BY Region) AS RegionTotal
FROM Order
)
SELECT Region, OrderNo, Value, DeliveryDate, RegionTotal
FROM RegionOrders RO
INNER JOIN Delivery D on D.OrderNo = RO.OrderNo

Related

SQL command to filter based on multiple tables and criteria

I am trying to learn sql, its driving me nuts. I cannot seem to grasp the proper syntax to achieve my desired output. I am watching videos on udemy and reading books on basic sql trying to teach myself, but it seems they all fall short in helping me bridge this gap I seem to not be able to over come.
I have a pretty good handle on the basics of the SELECT, FROM, WHEN commands. I seem to be gaining knowledge on using aggregate functions, but I am by no means an expert.
I have two tables, "Orders" and "OrderDet". "Orders" contains the CustomerName and the OrderNo, and OrderDet contains everything else, like PartNo, DateFinished, OrderNo, etc.
I have a situation where I can have multiple customers order the same part number. I want to show all the last orders all customers placed.
For example
SELECT Orders.CustDesc, OrderDet.OrderNo, OrderDet.PartNo, OrderDet.DateFinished
FROM Orders
JOIN OrderDet ON Orders.OrderNo = OrderDet.OrderNo
ORDER BY OrderDet.PartNo, OrderDet.DateFinished
This query returns:
Customer OrderNo PartNo Date Finished
--------------------------------------------------------
Cust 1 5032 12345678-1 NULL
Cust 2 10032 12345678-1 2019-06-05 14:54:25.853
Cust 2 1048 12345678-1 2019-07-08 00:00:00.000
Cust 1 5028 12345678-1 2019-09-30 11:45:45.960
Cust 1 5029 12345678-1 2019-09-30 12:49:35.713
Cust 1 5030 12345678-1 2019-09-30 13:04:57.333
Cust 1 5031 12345678-1 2019-10-10 13:58:22.653
I'm still learning when and how to use aggregate function but seem to not be able to fully grasp the concept. I tried to use a MAX on the Date column and GROUP BY the Customer and PartNo, but unless I remove the Order Number, the output never collapses down to what I want.
For example I used:
SELECT Orders.CustDesc, OrderDet.PartNo, MAX(OrderDet.DateFinished)
FROM Orders
JOIN OrderDet ON Orders.OrderNo = OrderDet.OrderNo
GROUP BY Orders.CustDesc, OrderDet.PartNo
ORDER BY OrderDet.PartNo
Removing OrderDet.OrderNo from SELECT, and OrderDet.DateFinished from the Order By.
This returns the row output I desire, but lacking all the columns I want.
Customer PartNo Date Finished
--------------------------------------------
Cust 2 12345678-1 2019-07-08 00:00:00.000
Cust 1 12345678-1 2019-10-10 13:58:22.653
As soon as I try and add the OrderNo back into the mix, I get the same output as the first. I think I understand why this is happening because all the OrderNo's are unique and cannot get grouped, but I cant grasp how to over come this.
I understand this is a basic SQL command but I cannot seem to understand how to get the output I desire. In this example I wanted to only see the two rows of unique Customers based on the last date the PartNo was finished, but have the entire rows contents shown. Not just three columns.
Again, I am trying to learn this stuff and I can only read and re-read the same basic content to learn how to do this for so long. Everything I read seems to lack the info my brain seems to require for that "AH HA" moment.
Perhaps someone could help bridge this gap?
I am interpreting your question as wanting the most recent order for a given customer for each part that customer has ordered.
For this, I would recommend window functions:
select CustDesc, OrderNo, od.DateFinished
from (select o.custdesc, od.orderno, od.partno, od.datefinished,
row_number() over (partition by o.custdesc, od.partno order by od.datefinished desc) as seqnum
from Orders o join
orderdet od
on o.OrderNo = od.OrderNo
) od
where seqnum = 1;
order by od.PartNo, od.DateFinished

SQL-sum over dynamic period

I have 2 tables: Customers and Actions, where each customer has uniqe ID (which can be found in each table).
Part of the customers became club members at a specific date (change between the customers). I'm trying to summarize their purchases until that date, and to get those who purchase more than (for example) 200 until they become club members.
For example, I can have the following customer:
custID purchDate purchAmount
1 2015-05-12 100
1 2015-07-12 150
1 2015-12-29 320
Now, assume that custID=1 became a club member at 2015-12-25; in that case, I'd like to get SUM(purchAmount)=250 (pay attention that I'd like to get this customer because 250>200).
I tried the following:
SELECT cust.custID, SUM(purchAmount)totAmount
FROM customers cust
JOIN actions act
ON cust.custID=act.custID
WHERE act.clubMember=1
AND cust.purchDate<act.clubMemberDate
GROUP BY cust.custID
HAVING totAmount>200;
Is it the right way to "attack" this question, or should I use something like while loop over the clubMemberDate (which telling the truth-I don't know how to do)?
I'm working with Teradata.
Your help will be appreciated.

Retrieving the latest transaction for an item

I have a table that lists lots of item transactions. I need to get the last record for each item (the one dated the latest).
For example my table looks like this:
Item Date TrxLocation
XXXXX 1/1/13 WAREHOUSE
XXXXX 1/2/13 WAREHOUSE
XXXXX 1/3/13 WAREHOUSE
aaaa 1/1/13 warehouse
aaaa 2/1/13 WAREHOUSE
I want the data to come back as follows:
XXXXX 1/3/13 WAREHOUSE
AAAA 2/1/13 WAREHOUSE
I tried doing something like this but it is bringing back the wrong date
select Distinct ITEMNMBR
TRXLOCTN,
DATERECD
from TEST
where DateRecd = (select max(DATERECD)
from TEST)
Any help is appreciated.
You're on the right track. You just need to change your subquery to a correlated subquery, which means that you give it some context to the outer query. If you just run your subquery (select max(DATERECD) from TEST) by itself, what do you get? You get a single date that is the latest date in the whole table, regardless of item. You need to tie the subquery to the outer query by linking on the ITEMNMBR column, like this:
SELECT ITEMNMBR, TRXLOCTN, DATERECD
FROM TEST t
WHERE DateRecd = (
SELECT MAX (DATERECD)
FROM TEST tMax
WHERE tMax.ITEMNMBR = t.ITEMNMBR)
No need for subquery. You are querying a single table and need to select MAX(date) and GROUP BY item and TrxLocation.
SELECT Item, max(DATERECD) AS max_dt_recd, TrxLocation
FROM test
GROUP BY Item, TrxLocation
/

JavaDB: get ordered records in the subquery

I have the following "COMPANIES_BY_NEWS_REPUTATION" in my JavaDB database (this is some random data just to represent the structure)
COMPANY | NEWS_HASH | REPUTATION | DATE
-------------------------------------------------------------------
Company A | 14676757 | 0.12345 | 2011-05-19 15:43:28.0
Company B | 454564556 | 0.78956 | 2011-05-24 18:44:28.0
Company C | 454564556 | 0.78956 | 2011-05-24 18:44:28.0
Company A | -7874564 | 0.12345 | 2011-05-19 15:43:28.0
One news_hash may relate to several companies while a company can relate to several news_hashes as well. Reputation and date are bound to the news_hash.
What I need to do is calculate the average reputation of last 5 news for every company. In order to do that I somehow feel that I need to user 'order by' and 'offset' in a subquery as shown in the code below.
select COMPANY, avg(REPUTATION) from
(select * from COMPANY_BY_NEWS_REPUTATION order by "DATE" desc
offset 0 rows fetch next 5 row only) as TR group by COMPANY;
However, JavaDB allows neither ORDER BY, nor OFFSET in a subquery. Could anyone suggest a working solution for my problem please?
Which version of JavaDB are you using? According to the chapter TableSubquery in the JavaDB documentation, table subqueries do support order by and fetch next, at least in version 10.6.2.1.
Given that subqueries can be ordered and the size of the result set can be limited, the following (untested) query might do what you want:
select COMPANY, (select avg(REPUTATION)
from (select REPUTATION
from COMPANY_BY_NEWS_REPUTATION
where COMPANY = TR.COMPANY
order by DATE desc
fetch first 5 rows only))
from (select distinct COMPANY
from COMPANY_BY_NEWS_REPUTATION) as TR
This query retrieves all distinct company names from COMPANY_BY_NEWS_REPUTATION, then retrieves the average of the last five reputation rows for each company. I have no idea whether it will perform sufficiently, that will likely depend on the size of your data set and what indexes you have in place.
If you have a list of unique company names in another table, you can use that instead of the select distinct ... subquery to retrieve the companies for which to calculate averages.

SQL count query not returning correct results

Struggling getting a query to work……..
I have two tables:-
tbl.candidates:
candidate_id
agency_business_unit_id
tbl.candidate_employment_tracker
candidate_id
The candidate employment can have duplicate records of a candidate_id as it contains records on their working history for different clients.
The candidates tables is unique for each candidate.
I'm trying to obtain results which will group by agency_business_unit_id and count the amount of candidates each has which exist in the candidate_employment_tracker.
E.g.
Agency Business Unit Id | Candidates
------------------------------------------------------------
100 | 2
987 | 1
12 | 90
The query I'm working on doesn't appear to be working as I'm getting the count of the candidates in candidate_employment_tracker.
SELECT
abu.agency_business_unit_id,
abu.agency_business_unit_name,
count(c.candidate_id) AS candidateCount
FROM candidate_employment_tracker cet
INNER JOIN candidate c ON c.candidate_id = cet.candidate_id
INNER JOIN agency_business_unit abu ON abu.agency_business_unit_id = c.agency_business_unit_id
WHERE c.candidate_ni_number NOT REGEXP '^[A-CEGHJ-PR-TW-Z][A-CEGHJ-NPR-TW-Z] ?[0-9]{2} ?[0-9]{2} ?[0-9]{2} ?[ABCD]$'
GROUP BY abu.agency_business_unit_id
ORDER BY abu.agency_business_unit_name ASC
I've tried several approaches and the results are inconsistent. For instance I know one of the agency business units only has 1 candidate but the result is 2. This is as a result of this particular candidate having 2 records in the candidate employment tracker table. I'll keep bashing away but any help would be much appreciated.
Do you need
count(DISTINCT c.candidate_id)
That would avoid the double counting where candidates have 2 records in the candidate employment tracker table.
Hmmm this doesn't appear to work now that I look further into the results. When I compare the candidates for a agency business unit I get inconsistent count numbers.