applying DISTINCT SUM - sql

I'm working on a code that would help me get the sum amount of GL Sub Heads (that's an accounting term) in a table. things is, there are multiple rows with the same GL Sub Head but with different amounts. I need to sum up the amounts of the table per GL Sub Head. that of course, involves the use of SUM and DISTINCT clauses. here's what I have so far:
select
gstt.bal_date,
sum(gstt.tot_dr_bal) as totalDr,
sum(gstt.tot_cr_bal) as totalCr
from(
select distinct
gam.gl_sub_head_code,
gstt.tot_dr_bal,
gstt.tot_cr_bal
from tbaadm.gam
left outer join tbaadm.eab
on gam.acid = eab.acid
and gam.bank_id = eab.bank_id
left outer join tbaadm.gstt
on gam.sol_id = gstt.sol_id
and gam.bank_id = gstt.bank_id
and gam.gl_sub_head_code = gstt.gl_sub_head_code
and gam.acct_crncy_code = gstt.crncy_code
where
gam.acct_ownership = 'O'
--and eab.eod_date = TO_DATE('3/24/2014', 'MM/DD/YYYY')
and gstt.bal_date = TO_DATE('3/24/2014', 'MM/DD/YYYY')
) group by gstt.bal_date, totalDr, totalCr
can't make the bloody thing work, though. I know I'm missing something, I just can't put my finger on what it is, and where should I put it in the code. Any help would be appreciated. If you need further clarifications, just ask.
EDIT
I've got the code to work now, somewhat. Because there's a problem.
select
bal_date,
gl_sub_head_code,
sum(tot_dr_bal),
sum(tot_cr_bal)
from(
select distinct
gam.gl_sub_head_code,
gstt.tot_dr_bal,
gstt.tot_cr_bal,
gstt.bal_date
from tbaadm.gam
left outer join tbaadm.eab
on gam.acid = eab.acid
and gam.bank_id = eab.bank_id
left outer join tbaadm.gstt
on gam.sol_id = gstt.sol_id
and gam.bank_id = gstt.bank_id
and gam.gl_sub_head_code = gstt.gl_sub_head_code
and gam.acct_crncy_code = gstt.crncy_code
where
gam.acct_ownership = 'O'
--and eab.eod_date = TO_DATE('3/24/2014', 'MM/DD/YYYY')
and gstt.bal_date = TO_DATE('3/24/2014', 'MM/DD/YYYY')
) group by bal_date, gl_sub_head_code
as you can see, there is a DISTINCT clause at the beginning of the subquery. whenever I remove that, TOAD outputs the same set of results. I get the same results regardless if I have the DISTINCT clause or not. I have a feeling there's something wrong with my code, as what I'm expecting is that the DISTINCT should make a difference.

"I get the same results regardless if I have the DISTINCT clause or not."
This means your query already returns a unique set of rows.
I have a feeling there's something wrong with my code, as what I'm expecting is that the DISTINCT should make a difference.
Why do you think that?
DISTINCT made a difference in your first query because you hadn't included bal_date in the sub-query's projection. So you have some gl_sub_head instances where the balances are the same for multiple days.
Except that you are filtering on bal_date so you should only get one value for it anyway. Hmmmm....
Clearly something is awry in your query. There are two obvious puzzlers.
Why do you have a LEFT OUTER JOIN with gstt? The hard filter on gstt.bal_date means you will only return records where gstt.bal_date is not null, so the result set would still be the same if that was an INNER JOIN.
Why bother querying eab at all? You don't use any values from it and the LEFT OUTER JOIN means it doesn't restrict the values retrieved from gam in anyway. Plus the fact that you have commented out the filter on eab.eod_date suggests that it currently produces a cross join which might generate multiple duplicate values which you then think you need a DISTINCT to eradicate.
So, I don't have an actual answer for you, but I think you need to re-visit your business logic and figure out what your query actually needs to do.

Related

If transaction within date range, then return customer name (and not all the transactions!)

This code is taking a significant amount of time to run. It's returning every single transaction within the date range but I just need to know if the customer has had at least one transaction, then include the CustomerID, CustomerName, Type, Sign, ReportingName.
I think I need to GROUP BY 'CustomerID' but again only if there was a transaction within the date range. And of course, I'm sure there is an optimal way to execute the below TSQL because it's quite slow at present.
Thanks in advance for any help!
SELECT [ABC].[dbo].[vwPrimary].[RelatedNameId] AS CustomerID
,[ABC].[dbo].[vwPrimary].[RelatedName] AS CustomerName
,[AFGPurchase].[IvL].[TaxTreatment].[ParticluarType] AS Type
,[AFGPurchase].[IvL].[Product].[Sign] AS [Sign]
,[AFGPurchase].[IvL].[Product].[ReportingName] AS ReportingName
,[AFGPurchase].[IvL].[Transaction].[EffectiveDate] AS 'Date'
FROM (((([AFGPurchase].[IvL].[Account]
INNER JOIN [AFGPurchase].[IvL].[Position] ON [AFGPurchase].[IvL].[Account].[AccountId] = [AFGPurchase].[IvL].[Position].[AccountId])
INNER JOIN [AFGPurchase].[IvL].[Product] ON [AFGPurchase].[IvL].[Position].[ProductID] = [AFGPurchase].[IvL].[Product].[ProductId])
INNER JOIN [ABC].[dbo].[vwPrimary] ON [AFGPurchase].[IvL].[Account].[ReportingEntityId] = [ABC].[dbo].[vwPrimary].[RelatedNameId])
INNER JOIN [AFGPurchase].[IvL].[TaxTreatment] ON [AFGPurchase].[IvL].[Account].[TaxTreatmentId] = [AFGPurchase].[IvL].[TaxTreatment].[TaxTreatmentId])
INNER JOIN [AFGPurchase].[IvL].[Transaction] ON [AFGPurchase].[IvL].[Position].[PositionId] = [AFGPurchase].[IvL].[Transaction].[PositionId]
WHERE ((([AFGPurchase].[IvL].[TaxTreatment].[RegistrationType]) LIKE 'NON%')
AND (([AFGPurchase].[IvL].[Product].[Sign])='XYZ2')
AND (([AFGPurchase].[IvL].[Position].[Quantity])<>0)
AND (([AFGPurchase].[IvL].[Transaction].[EffectiveDate]) between '2021-12-31' and '2022-12-31'))
Check your indexes on fragmentation, to speed up your query. And make sure you have indexes.
If you just need one result, just TOP 1
SELECT TOP 1 [ABC].[dbo].[vwPrimary].[RelatedNameId] AS CustomerID
,[ABC].[dbo].[vwPrimary].[RelatedName] AS CustomerName
,[AFGPurchase].[IvL].[TaxTreatment].[ParticluarType] AS Type
,[AFGPurchase].[IvL].[Product].[Sign] AS [Sign]
,[AFGPurchase].[IvL].[Product].[ReportingName] AS ReportingName
,[AFGPurchase].[IvL].[Transaction].[EffectiveDate] AS 'Date'
FROM (((([AFGPurchase].[IvL].[Account]
INNER JOIN [AFGPurchase].[IvL].[Position] ON [AFGPurchase].[IvL].[Account].[AccountId] = [AFGPurchase].[IvL].[Position].[AccountId])
INNER JOIN [AFGPurchase].[IvL].[Product] ON [AFGPurchase].[IvL].[Position].[ProductID] = [AFGPurchase].[IvL].[Product].[ProductId])
INNER JOIN [ABC].[dbo].[vwPrimary] ON [AFGPurchase].[IvL].[Account].[ReportingEntityId] = [ABC].[dbo].[vwPrimary].[RelatedNameId])
INNER JOIN [AFGPurchase].[IvL].[TaxTreatment] ON [AFGPurchase].[IvL].[Account].[TaxTreatmentId] = [AFGPurchase].[IvL].[TaxTreatment].[TaxTreatmentId])
INNER JOIN [AFGPurchase].[IvL].[Transaction] ON [AFGPurchase].[IvL].[Position].[PositionId] = [AFGPurchase].[IvL].[Transaction].[PositionId]
WHERE ((([AFGPurchase].[IvL].[TaxTreatment].[RegistrationType]) LIKE 'NON%')
AND (([AFGPurchase].[IvL].[Product].[Sign])='XYZ2')
AND (([AFGPurchase].[IvL].[Position].[Quantity])<>0)
AND (([AFGPurchase].[IvL].[Transaction].[EffectiveDate]) between '2021-12-31' and '2022-12-31'))
If you only need to check for the existence of a row, and not actually get any data from it then use EXISTS() rather than INNER JOIN, e.g.
SELECT vpr.[RelatedNameId] AS CustomerID
,vpr.[RelatedName] AS CustomerName
,tt.[ParticluarType] AS Type
,prd.[Sign]
,prd.ReportingName
,tr.[EffectiveDate] AS [Date]
FROM [AFGPurchase].[IvL].[Account] AS acc
INNER JOIN [AFGPurchase].[IvL].[Position] AS pos ON acc.[AccountId] = pos.[AccountId]
INNER JOIN [AFGPurchase].[IvL].[Product] AS prd ON pos.[ProductID] = prd.[ProductId]
INNER JOIN [ABC].[dbo].[vwPrimary] AS vpr ON acc.[ReportingEntityId] = vpr.[RelatedNameId]
INNER JOIN [AFGPurchase].[IvL].[TaxTreatment] AS tt ON acc.[TaxTreatmentId] = tt.[TaxTreatmentId]
WHERE tt.[RegistrationType] LIKE 'NON%'
AND prd.[Sign]='XYZ2'
AND pos.[Quantity]<>0
AND EXISTS
( SELECT 1
FROM [AFGPurchase].[IvL].[Transaction] AS tr
WHERE tr.[PositionId] = pos.[PositionId]
AND tr.[EffectiveDate] BETWEEN '2021-12-31' AND '2022-12-31'
);
N.B. I have added in table aliases and removed all the unnecessary parentheses for readability - you may disagree that it is more readable, but I would expect that most people would agree
This may not offer any performance benefits over simply grouping by the columns you are selecting and keeping your joins as they are - SQL is after all a declarative language where you tell the engine what you want, not how to get it. So you may find that the two plans are the same because you are requesting the same result. Using EXISTS does have the advance of being more semantically tied to what you are trying to do though, so gives the optimiser the best chance of getting to the right plan. If you are still having performance issues, then you may need to inspect the execution plan, and see if it suggests any indexes.
Finally, if you are really still using SQL Server 2008 then you really need to start thinking about your upgrade path. It has been completely unsupported for over 3 years now.

COUNT is outputting more than one row

I am having a problem with my SQL query using the count function.
When I don't have an inner join, it counts 55 rows. When I add the inner join into my query, it adds a lot to it. It suddenly became 102 rows.
Here is my SQL Query:
SELECT COUNT([fmsStage].[dbo].[File].[FILENUMBER])
FROM [fmsStage].[dbo].[File]
INNER JOIN [fmsStage].[dbo].[Container]
ON [fmsStage].[dbo].[File].[FILENUMBER] = [fmsStage].[dbo].[Container].[FILENUMBER]
WHERE [fmsStage].[dbo].[File].[RELATIONCODE] = 'SHIP02'
AND [fmsStage].[dbo].[Container].DELIVERYDATE BETWEEN '2016-10-06' AND '2016-10-08'
GROUP BY [fmsStage].[dbo].[File].[FILENUMBER]
Also, I have to do TOP 1 at the SELECT statement because it returns 51 rows with random numbers inside of them. (They are probably not random, but I can't figure out what they are.)
What do I have to do to make it just count the rows from [fmsStage].[dbo].[file].[FILENUMBER]?
First, your query would be much clearer like this:
SELECT COUNT(f.[FILENUMBER])
FROM [fmsStage].[dbo].[File] f INNER JOIN
[fmsStage].[dbo].[Container] c
ON v.[FILENUMBER] = c.[FILENUMBER]
WHERE f.[RELATIONCODE] = 'SHIP02' AND
c.DELIVERYDATE BETWEEN '2016-10-06' AND '2016-10-08';
No GROUP BY is necessary. Otherwise you'll just one row per file number, which doesn't seem as useful as the overall count.
Note: You might want COUNT(DISTINCT f.[FILENUMBER]). Your question doesn't provide enough information to make a judgement.
Just remove GROUP BY Clause
SELECT COUNT([fmsStage].[dbo].[File].[FILENUMBER])
FROM [fmsStage].[dbo].[File]
INNER JOIN [fmsStage].[dbo].[Container]
ON [fmsStage].[dbo].[File].[FILENUMBER] = [fmsStage].[dbo].[Container].[FILENUMBER]
WHERE [fmsStage].[dbo].[File].[RELATIONCODE] = 'SHIP02'
AND [fmsStage].[dbo].[Container].DELIVERYDATE BETWEEN '2016-10-06' AND '2016-10-08'

left join excludes nulls and takes left value

SELECT p.ticket AS posted,
e.ticket AS settled,
Sum(e.amount)
FROM post AS p
LEFT JOIN settle AS e
ON p.ticket = e.ticket
WHERE p.date = '2016-05-10 00:00:00.000'
GROUP BY p.pticket,
e.eticket
ORDER BY posted
I understand that the grouping or where is the culprit but I've tried so many variations, the rows for the two tables are :
(Table1=Table2)
(total = (item + tax= total))
So the second table has 2 rows that I sum. I need the date because it has to much info and I've tried "is null" in dates and in other places but can't get this right. Instead of null, it shows the value of the left table as if they match.
So I figured out! Just love this site and wanted to leave input for whoever runs into this, please correct me if my information is not accurate as I would like to explain in detail but basically, from what I understand and from SQL Fundamentals, nulls are considered outer rows and therefore I would need a "Left Outer Join":
SELECT p.ticket AS posted,
e.ticket AS settled,
Sum(e.amount)
FROM post AS p
LEFT JOIN settle AS e
ON p.ticket = e.ticket
WHERE p.date = '2016-05-10 00:00:00.000'
GROUP BY p.pticket,
e.eticket
ORDER BY posted
A good rule that stood out for me and helps me check my joins with nulls is that the Where clause should only have conditions to the table where the nulls will not be included, so you keep the outer rows, with the exception of "is null". You could use that with a column on the non-preserved table and still get nulls.

Include missing years in Group By query

I am fairly new in Access and SQL programming. I am trying to do the following:
Sum(SO_SalesOrderPaymentHistoryLineT.Amount) AS [Sum Of PaymentPerYear]
and group by year even when there is no amount in some of the years. I would like to have these years listed as well for a report with charts. I'm not certain if this is possible, but every bit of help is appreciated.
My code so far is as follows:
SELECT
Base_CustomerT.SalesRep,
SO_SalesOrderT.CustomerId,
Base_CustomerT.Customer,
SO_SalesOrderPaymentHistoryLineT.DatePaid,
Sum(SO_SalesOrderPaymentHistoryLineT.Amount) AS [Sum Of PaymentPerYear]
FROM
Base_CustomerT
INNER JOIN (
SO_SalesOrderPaymentHistoryLineT
INNER JOIN SO_SalesOrderT
ON SO_SalesOrderPaymentHistoryLineT.SalesOrderId = SO_SalesOrderT.SalesOrderId
) ON Base_CustomerT.CustomerId = SO_SalesOrderT.CustomerId
GROUP BY
Base_CustomerT.SalesRep,
SO_SalesOrderT.CustomerId,
Base_CustomerT.Customer,
SO_SalesOrderPaymentHistoryLineT.DatePaid,
SO_SalesOrderPaymentHistoryLineT.PaymentType,
Base_CustomerT.IsActive
HAVING
(((SO_SalesOrderPaymentHistoryLineT.PaymentType)=1)
AND ((Base_CustomerT.IsActive)=Yes))
ORDER BY
Base_CustomerT.SalesRep,
Base_CustomerT.Customer;
You need another table with all years listed -- you can create this on the fly or have one in the db... join from that. So if you had a table called alltheyears with a column called y that just listed the years then you could use code like this:
WITH minmax as
(
select min(year(SO_SalesOrderPaymentHistoryLineT.DatePaid) as minyear,
max(year(SO_SalesOrderPaymentHistoryLineT.DatePaid) as maxyear)
from SalesOrderPaymentHistoryLineT
), yearsused as
(
select y
from alltheyears, minmax
where alltheyears.y >= minyear and alltheyears.y <= maxyear
)
select *
from yearsused
join ( -- your query above goes here! -- ) T
ON year(T.SO_SalesOrderPaymentHistoryLineT.DatePaid) = yearsused.y
You need a data source that will provide the year numbers. You cannot manufacture them out of thin air. Supposing you had a table Interesting_year with a single column year, populated, say, with every distinct integer between 2000 and 2050, you could do something like this:
SELECT
base.SalesRep,
base.CustomerId,
base.Customer,
base.year,
Sum(NZ(data.Amount)) AS [Sum Of PaymentPerYear]
FROM
(SELECT * FROM Base_CustomerT INNER JOIN Year) AS base
LEFT JOIN
(SELECT * FROM
SO_SalesOrderT
INNER JOIN SO_SalesOrderPaymentHistoryLineT
ON (SO_SalesOrderPaymentHistoryLineT.SalesOrderId = SO_SalesOrderT.SalesOrderId)
) AS data
ON ((base.CustomerId = data.CustomerId)
AND (base.year = Year(data.DatePaid))),
WHERE
(data.PaymentType = 1)
AND (base.IsActive = Yes)
AND (base.year BETWEEN
(SELECT Min(year(DatePaid) FROM SO_SalesOrderPaymentHistoryLineT)
AND (SELECT Max(year(DatePaid) FROM SO_SalesOrderPaymentHistoryLineT))
GROUP BY
base.SalesRep,
base.CustomerId,
base.Customer,
base.year,
ORDER BY
base.SalesRep,
base.Customer;
Note the following:
The revised query first forms the Cartesian product of BaseCustomerT with Interesting_year in order to have base customer data associated with each year (this is sometimes called a CROSS JOIN, but it's the same thing as an INNER JOIN with no join predicate, which is what Access requires)
In order to have result rows for years with no payments, you must perform an outer join (in this case a LEFT JOIN). Where a (base customer, year) combination has no associated orders, the rest of the columns of the join result will be NULL.
I'm selecting the CustomerId from Base_CustomerT because you would sometimes get a NULL if you selected from SO_SalesOrderT as in the starting query
I'm using the Access Nz() function to convert NULL payment amounts to 0 (from rows corresponding to years with no payments)
I converted your HAVING clause to a WHERE clause. That's semantically equivalent in this particular case, and it will be more efficient because the WHERE filter is applied before groups are formed, and because it allows some columns to be omitted from the GROUP BY clause.
Following Hogan's example, I filter out data for years outside the overall range covered by your data. Alternatively, you could achieve the same effect without that filter condition and its subqueries by ensuring that table Intersting_year contains only the year numbers for which you want results.
Update: modified the query to a different, but logically equivalent "something like this" that I hope Access will like better. Aside from adding a bunch of parentheses, the main difference is making both the left and the right operand of the LEFT JOIN into a subquery. That's consistent with the consensus recommendation for resolving Access "ambiguous outer join" errors.
Thank you John for your help. I found a solution which works for me. It looks quiet different but I learned a lot out of it. If you are interested here is how it looks now.
SELECT DISTINCTROW
Base_Customer_RevenueYearQ.SalesRep,
Base_Customer_RevenueYearQ.CustomerId,
Base_Customer_RevenueYearQ.Customer,
Base_Customer_RevenueYearQ.RevenueYear,
CustomerPaymentPerYearQ.[Sum Of PaymentPerYear]
FROM
Base_Customer_RevenueYearQ
LEFT JOIN CustomerPaymentPerYearQ
ON (Base_Customer_RevenueYearQ.RevenueYear = CustomerPaymentPerYearQ.[RevenueYear])
AND (Base_Customer_RevenueYearQ.CustomerId = CustomerPaymentPerYearQ.CustomerId)
GROUP BY
Base_Customer_RevenueYearQ.SalesRep,
Base_Customer_RevenueYearQ.CustomerId,
Base_Customer_RevenueYearQ.Customer,
Base_Customer_RevenueYearQ.RevenueYear,
CustomerPaymentPerYearQ.[Sum Of PaymentPerYear]
;

Why is my join giving me way too many results?

I have:
SELECT dv.VariableID ,
ds.DataSourceID ,
p.DataVariableDataSourceParamId ,
p.ParamCode ,
p.ParamDisplayName ,
p.DVDSParamControlType ,
p.DependentOnDVDSParamId ,
pv.ParamValue
FROM dbo.DataVariable dv
INNER JOIN dbo.DataVariableDataSource ds ON dv.DataSourceId = ds.DataSourceID
INNER JOIN dbo.DataVariableDataSourceParam p ON ds.DataSourceID = p.DataSourceId
INNER JOIN dbo.DataVariableDataSourceParamValue pv ON p.DataVariableDataSourceParamId = pv.DataVariableDataSourceParamId
WHERE dv.VariableID = #vid
ORDER BY dv.VariableID
When I just have the first two joins, I get what I want: 6 results. When I add the third, I get 660. I just want the ParamValue for the 6 records from the first 2 joins and I can't seem to figure out why this is breaking. I'm on my 12th hour of coding and I'm sure this is insanely obvious, but I could use a hand. Thanks in advance.
This is going to be because you have numerous rows in your pv table that match on DataVariableDataSourceParamId
You can verify by adding a SELECT DISTINCT. You may need to clean that table up or keep the distinct
However, the distinct will only help if pv.ParamValue is the same for all, otherwise you are rightfully getting more matches as what is happening is that you are finding all the matches for DataVariableDataSourceParamId and displaying them. If all those matches are the same value, then the distinct will indeed help, though