Sum SQL statement outputs many rows when I expect only one - sql

So I have 3 tables joined as shown:
What I want to do is query for the sum of all the holdings that fall into the criteria specified for the clients in my query. Here is what I have:
SELECT Sum(Holdings.HoldingValue) AS SumOfHoldingValue
FROM (Clients INNER JOIN Accounts
ON Clients.ClientID = Accounts.ClientID)
INNER JOIN Holdings
ON Accounts.AccountID = Holdings.AccNum
GROUP BY Holdings.HoldingDate, Clients.Active, Clients.RiskCode, Clients.NewClient, Clients.BaseCurrency, Clients.ClientID
HAVING (((Holdings.HoldingDate)=#3/31/2013#)
AND ((Clients.Active)=True)
AND ((Clients.RiskCode) In (1,2))
AND ((Clients.NewClient)=True)
AND ((Clients.BaseCurrency)='GBP')
AND ((Clients.ClientID) Not In (10022,10082,10083)));
Here's an example of what I get as the result:
SumOfHoldingValue
1056071.96
466595.6
1074459.38
371142.54
814874.42
458203.65
8308697.09
254733.94
583796.33
443897.76
203787.11
1057445.84
1058751.26
317507.43
So there are quite a few criteria for the client table but the result is a list of SumOfHoldingValue when what I want is just one number. I.e. the sum of all the holding values. Why is it not grouping them all together to form one total?

Since you're not computing any aggregates on the values in the HAVING clause, I think you just want this:
SELECT Sum(Holdings.HoldingValue) AS SumOfHoldingValue
FROM (Clients INNER JOIN Accounts
ON Clients.ClientID = Accounts.ClientID)
INNER JOIN Holdings
ON Accounts.AccountID = Holdings.AccNum
WHERE (((Holdings.HoldingDate)=#3/31/2013#)
AND ((Clients.Active)=True)
AND ((Clients.RiskCode) In (1,2))
AND ((Clients.NewClient)=True)
AND ((Clients.BaseCurrency)='GBP')
AND ((Clients.ClientID) Not In (10022,10082,10083)));
Which, with no GROUP clause will produce a single GROUP (over the entire set) and produce a single row.

If you just want totals - remove the group by. With the group by clause it gives you totals for every group separately.
If you need to filter data put the condition into Where clause instead

Your query contains a group by clause which returns each group on its own line.
You are also using a having clause. The having clause is applied after the group by. Usually, it would contain aggregation functions -- such as having count(*) > 1. In your case, it is used as a where clause.
Try rewriting the query like this:
SELECT Sum(Holdings.HoldingValue) AS SumOfHoldingValue
FROM (Clients INNER JOIN Accounts
ON Clients.ClientID = Accounts.ClientID)
INNER JOIN Holdings
ON Accounts.AccountID = Holdings.AccNum
WHERE (((Holdings.HoldingDate)=#3/31/2013#)
AND ((Clients.Active)=True)
AND ((Clients.RiskCode) In (1,2))
AND ((Clients.NewClient)=True)
AND ((Clients.BaseCurrency)='GBP')
AND ((Clients.ClientID) Not In (10022,10082,10083)));

Related

SQL Server 2016 Sub Query Guidance

I am currently working on an assignment for my SQL class and I am stuck. I'm not looking for full code to answer the question, just a little nudge in the right direction. If you do provide full code would you mind a small explanation as to why you did it that way (so I can actually learn something.)
Here is the question:
Write a SELECT statement that returns three columns: EmailAddress, ShipmentId, and the order total for each Client. To do this, you can group the result set by the EmailAddress and ShipmentId columns. In addition, you must calculate the order total from the columns in the ShipItems table.
Write a second SELECT statement that uses the first SELECT statement in its FROM clause. The main query should return two columns: the Client’s email address and the largest order for that Client. To do this, you can group the result set by the EmailAddress column.
I am confused on how to pull in the EmailAddress column from the Clients table, as in order to join it I have to bring in other tables that aren't being used. I am assuming there is an easier way to do this using sub Queries as that is what we are working on at the time.
Think of SQL as working with sets of data as opposed to just tables. Tables are merely a set of data. So when you view data this way you immediately see that the query below returns a set of data consisting of the entirety of another set, being a table:
SELECT * FROM MyTable1
Now, if you were to only get the first two columns from MyTable1 you would return a different set that consisted only of columns 1 and 2:
SELECT col1, col2 FROM MyTable1
Now you can treat this second set, a subset of data as a "table" as well and query it like this:
SELECT
*
FROM (
SELECT
col1,
col2
FROM
MyTable1
)
This will return all the columns from the two columns provided in the inner set.
So, your inner query, which I won't write for you since you appear to be a student, and that wouldn't be right for me to give you the entire answer, would be a query consisting of a GROUP BY clause and a SUM of the order value field. But the key thing you need to understand is this set thinking: you can just wrap the ENTIRE query inside brackets and treat it as a table the way I have done above. Hopefully this helps.
You need a subquery, like this:
select emailaddress, max(OrderTotal) as MaxOrder
from
( -- Open the subquery
select Cl.emailaddress,
Sh.ShipmentID,
sum(SI.Value) as OrderTotal -- Use the line item value column in here
from Client Cl -- First table
inner join Shipments Sh -- Join the shipments
on Sh.ClientID = Cl.ClientID
inner join ShipItem SI -- Now the items
on SI.ShipmentID = Sh.ShipmentID
group by C1.emailaddress, Sh.ShipmentID -- here's your grouping for the sum() aggregation
) -- Close subquery
group by emailaddress -- group for the max()
For the first query you can join the Clients to Shipments (on ClientId).
And Shipments to the ShipItems table (on ShipmentId).
Then group the results, and count or sum the total you need.
Using aliases for the tables is usefull, certainly when you select fields from the joined tables that have the same column name.
select
c.EmailAddress,
i.ShipmentId,
SUM((i.ShipItemPrice - i.ShipItemDiscountAmount) * i.Quantity) as TotalPriceDiscounted
from ShipItems i
join Shipments s on (s.ShipmentId = i.ShipmentId)
left join Clients c on (c.ClientId = s.ClientId)
group by i.ShipmentId, c.EmailAddress
order by i.ShipmentId, c.EmailAddress;
Using that grouped query in a subquery, you can get the Maximum total per EmailAddress.
select EmailAddress,
-- max(TotalShipItems) as MaxTotalShipItems,
max(TotalPriceDiscounted) as MaxTotalPriceDiscounted
from (
select
c.EmailAddress,
-- i.ShipmentId,
-- count(*) as TotalShipItems,
SUM((i.ShipItemPrice - i.ShipItemDiscountAmount) * i.Quantity) as TotalPriceDiscounted
from ShipItems i
join Shipments s on (s.ShipmentId = i.ShipmentId)
left join Clients c on (c.ClientId = s.ClientId)
group by i.ShipmentId, c.EmailAddress
) q
group by EmailAddress
order by EmailAddress
Note that an ORDER BY is mostly meaningless inside a subquery if you don't use TOP.

Firebird can't recognize calculated column in group by clause

I have the following SQL:
select
inv.salesman_id,
(select salesman_goals.goal from salesman_goals
where salesman_goals.salesman_id = inv.salesman_id
and salesman_goals.group_id = g.group_id
and salesman_goals.subgroup_id = sg.subgroup_id
and salesman_goals.variation_id = v.variation_id)
as goal,
sum(i.quantity) as qnt
from invoiceitem i
inner join invoice inv on inv.invoice_id = i.invoice_id
inner join product p on p.product_id = i.product_id
left join groups g on g.group_id = p.group_id
left join subgroup sg on sg.group_id = g.group_id and sg.subgroup_id = p.subgroup_id
left join variation v on v.group_id = sg.group_id and v.subgroup_id = sg.subgroup_id and v.variation_id = p.variation_id
group by
1,2
which returns three columns, the first one is the salesman id, the second is a sub select to get the sales quantity goal, and the third is the actual sales quantity.
Even grouping by the first and second columns, firebird throws an error when executing the query:
Invalid expression in the select list (not contained in either an aggregate function or the GROUP BY clause).
What's the reason for this?
There is a column "in the select list (not contained in either an aggregate function or the GROUP BY clause)". Namely each column you mention in your subselect other than inv.salesman_id. Such a column has many values per group. When there is a GROUP BY (or just a HAVING, implicitly grouping by all columns) a SELECT clause returns one row per group. There is no single value to return. So you want (as you put in an answer yourself):
group by
inv.salesman_id,
g.group_id,
sg.subgroup_id,
v.variation_id
OK guys i found the solution for this problem.
The thing is, if you have a sub query in a column which will be in the group by clause, the parameters inside this sub query must also appear in the group by. So in this case, all i had to do was:
group by
inv.salesman_id,
g.group_id,
sg.subgroup_id,
v.variation_id
And that's it. Hope it helps if someone has the same issue in the future.

Subquery performing a COUNT DISTINCT on the wrong grouping

I'm fairly new to SQL and have a problem with a subquery that is performing a count distinct on the wrong grouping. I'd appreciate any help at all with this.
I have attendees at sessions for a particular group that I am querying for a MS SQL Server (SSRS 2008) Report.
I am trying to join TblGroup, TblGroupSession and TblGroupSUAttendee and count the DISTINCT number of GroupSUAttendee at any GROUP. The query below is counting the distinct number of GroupSUAttendee at any SESSION, so when I add the counts together for a group I am getting duplicates if a TblGroupSUAttendee has attended more than one session.
I need to keep one row per session in the query as I need that for other purposes, but it is fine for each session row to show the complete total of TblGroupSUAttendees for that group as I can reference that value once per group in my SSRS report.
Thoughts/advice/pointers much appreciated.
Thanks
Eils
SELECT
TblGroup.GroupId
,TblGroupSession.GroupSessionId
,TblGroupSession.GroupSessionDate
,TblGroupSUAttendee.GroupSUAttendeeCount
FROM
TblGroup
LEFT OUTER JOIN TblGroupSession
ON TblGroup.GroupId = TblGroupSession.GroupSessionGroupId
LEFT OUTER JOIN (select COUNT(DISTINCT GroupSUAttendeeId) AS GroupSUAttendeeCount,
GroupSUAttendeeGroupSessionId
FROM TblGroupSUAttendee
GROUP BY GroupSUAttendeeGroupSessionId) as TblGroupSUAttendee ON GroupSUAttendeeGroupSessionId = TblGroupSession.GroupSessionId
WHERE
GroupSessionDate >= #StartDate AND GroupSessionDate <= #EndDate
If you want to count attendees within groups, then use group by, but don't include per-session information. In other words, just combine the groups with the sessions, and the sessions with the attendees in one query. Then aggregate by GroupId and count the attendees:
SELECT g.GroupId,
COUNT(DISTINCT GroupSUAttendeeId) AS GroupSUAttendeeCount
FROM TblGroup g LEFT OUTER JOIN
tblGroupSession gs
ON g.GroupId = gs.GroupSessionGroupId LEFT OUTER JOIN
TblGroupSUAttendee ga
ON ga.GroupSUAttendeeGroupSessionId = gs.GroupSessionId
GROUP BY g.GroupId;

how to use count with where clause in join query

SELECT
DEPTMST.DEPTID,
DEPTMST.DEPTNAME,
DEPTMST.CREATEDT,
COUNT(USRMST.UID)
FROM DEPTMASTER DEPTMST
INNER JOIN USERMASTER USRMST ON USRMST.DEPTID=DEPTMST.DEPTID
WHERE DEPTMST.CUSTID=1000 AND DEPTMST.STATUS='ACT
I have tried several combination but I keep getting error
Column 'DEPTMASTER.DeptID' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause
I also add group by but it's not working
WHen using count like that you need to group on the selected columns,
ie.
SELECT
DEPTMST.DEPTID,
DEPTMST.DEPTNAME,
DEPTMST.CREATEDT,
COUNT(USRMST.UID)
FROM DEPTMASTER DEPTMST
INNER JOIN USERMASTER USRMST ON USRMST.DEPTID=DEPTMST.DEPTID
WHERE DEPTMST.CUSTID=1000 AND DEPTMST.STATUS='ACT'
GROUP BY DEPTMST.DEPTID,
DEPTMST.DEPTNAME,
DEPTMST.CREATEDT
you miss group by
SELECT DEPTMST.DEPTID,
DEPTMST.DEPTNAME,
DEPTMST.CREATEDT,
COUNT(USRMST.UID)
FROM DEPTMASTER DEPTMST
INNER JOIN USERMASTER USRMST ON USRMST.DEPTID=DEPTMST.DEPTID
WHERE DEPTMST.CUSTID=1000 AND DEPTMST.STATUS='ACT
group by DEPTMST.DEPTID,
DEPTMST.DEPTNAME,
DEPTMST.CREATEDT
All aggregate functions like averaging, counting,sum needs to be used along with a group by function. If you dont use a group by clause, you are performing the function on all the rows of the table.
Eg.
Select count(*) from table;
This returns the count of all the rows in the table.
Select count(*) from table group by name
This will first group the table data based on name and then return the count of each of these groups.
So in your case, if you want the countof USRMST.UID, group it by all the other columns in the select list.

SQL - why is this 'where' needed to remove row duplicates, when I'm already grouping?

Why, in this query, is the final 'WHERE' clause needed to limit duplicates?
The first LEFT JOIN is linking programs to entities on a UID
The first INNER JOIN is linking programs to a subquery that gets statistics for those programs, by linking on a UID
The subquery (that gets the StatsForDistributorClubs subset) is doing a grouping on UID columns
So, I would've thought that this would all be joining unique records anyway so we shouldn't get row duplicates
So why the need to limit based on the final WHERE by ensuring the 'program' is linked to the 'entity'?
(irrelevant parts of query omitted for clarity)
SELECT LmiEntity.[DisplayName]
,StatsForDistributorClubs.*
FROM [Program]
LEFT JOIN
LMIEntityProgram
ON LMIEntityProgram.ProgramUid = Program.ProgramUid
INNER JOIN
(
SELECT e.LmiEntityUid,
sp.ProgramUid,
SUM(attendeecount) [Total attendance],
FROM LMIEntity e,
Timetable t,
TimetableOccurrence [to],
ScheduledProgramOccurrence spo,
ScheduledProgram sp
WHERE
t.LicenseeUid = e.lmientityUid
AND [to].TimetableOccurrenceUid = spo.TimetableOccurrenceUid
AND sp.ScheduledProgramUid = spo.ScheduledProgramUid
GROUP BY e.lmientityUid, sp.ProgramUid
) AS StatsForDistributorClubs
ON Program.ProgramUid = StatsForDistributorClubs.ProgramUid
INNER JOIN LmiEntity
ON LmiEntity.LmiEntityUid = StatsForDistributorClubs.LmiEntityUid
LEFT OUTER JOIN Region
ON Region.RegionId = LMIEntity.RegionId
WHERE (
[Program].LicenseeUid = LmiEntity.LmiEntityUid
OR
[LMIEntityProgram].LMIEntityUid = LmiEntity.LmiEntityUid
)
If you were grouping in your outer query, the extra criteria probably wouldn't be needed, but only your inner query is grouped. Your LEFT JOIN to a grouped inner query can still result in multiple records being returned, for that matter any of your JOINs could be the culprit.
Without seeing sample of duplication it's hard to know where the duplicates originate from, but GROUPING on the outer query would definitely remove full duplicates, or revised JOIN criteria could take care of it.
You have in result set:
SELECT LmiEntity.[DisplayName]
,StatsForDistributorClubs.*
I suppose that you dublicates comes from LMIEntityProgram.
My conjecture: LMIEntityProgram - is a bridge table with both LmiEntityId an ProgramId, but you join only by ProgramId.
If you have several LmiEntityId for single ProgramId - you must have dublicates.
And this dublicates you're filtering in WHERE:
[LMIEntityProgram].LMIEntityUid = LmiEntity.LmiEntityUid
You can do it in JOIN:
LEFT JOIN LMIEntityProgram
ON LMIEntityProgram.ProgramUid = Program.ProgramUid
AND [LMIEntityProgram].LMIEntityUid = LmiEntity.LmiEntityUid