Why I need Group by in this simple query? - sql

UPDATE :
-----
the error might be in sum(si.amt_pd) from item table (as there is no relation) :
select SUM(si.amt_pd)amt_pd from [HMIS_REPORTING].HMIS_RPT_ME.dbo.item i
where
is there a work around?
----------
I am trying to run this query. The query just fetches the amount of a month based on some tables. It is just a part of a big query.
select s.sales_Contract_Nbr
, s.Sales_Id
, s.Sale_Dt
, YEAR(s.Sale_Dt) 'YEAR'
, MONTH(s.Sale_Dt) 'MONTH'
, s.Sales_Need_TYpe_Cd
, s.Sales_Status_Cd
, si.Posted
, s.location_Cd
, jan2011 = (
select SUM(si.amt_pd)amt_pd
from [HMIS_REPORTING].HMIS_RPT_ME.dbo.item i
where i.Item_Id = si.Product_Item_ID
and i.Item_Cd <> '*INT'
and convert(varchar(10),SI.Sales_Item_Dt,126) >= '2011-01-01'
and convert(varchar(10),SI.Sales_Item_Dt,126) >= '2011-01-31'
) INTO dbo.#a_acomparision
FROM [HMIS_REPORTING].HMIS_RPT_ME.dbo.Sales S
, [HMIS_REPORTING].HMIS_RPT_ME.dbo.Sales_Item SI
WHERE SI.Sales_Id = S.Sales_Id
and s.Sales_Contract_Nbr in (
select distinct (Sales_Contract_Nbr)
from mountainviewContracts
where Sales_Contract_Nbr <> '')
but I am getting the following error message.
Msg 8120, Level 16, State 1, Line 1
Column 'HMIS_REPORTING.HMIS_RPT_ME.dbo.Sales.Sales_Contract_Nbr' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
I just can't understand why my query should have a group by for sales_contract_nbr and even if I put in the group by clause it tells me that inner query si.Product_item_id and SI.sales_item_dt should also be contained in group by clause.
Please help me out.
Thanks in advance

This is a very subtle problem. However, I think the subquery should be:
select SUM(i.amt_pd)amt_pd from [HMIS_REPORTING].HMIS_RPT_ME.dbo.item i
That is, the alias should be i not si.
What is happening is that the sum in the subquery is on a value in the outer query. So, the SQL compiler assumes an aggregation query. As soon as the first column is found that is not an aggregation, it complains with the message that you have.
By the way, you should use proper join syntax, so you from clause looks like:
FROM [HMIS_REPORTING].HMIS_RPT_ME.dbo.Sales S join
[HMIS_REPORTING].HMIS_RPT_ME.dbo.Sales_Item SI
on SI.Sales_Id = S.Sales_Id

As #Gordon Linoff says, this is almost certainly because the query optimizer is treating this like a SUM operation, normalizing away the subquery for "jan2001".
If the amt_pd column is present in the ITEM table, Gordon's solution is the right one.
If not, you have to add the group by statement, as below.
select s.sales_Contract_Nbr
, s.Sales_Id
, s.Sale_Dt
, YEAR(s.Sale_Dt) 'YEAR'
, MONTH(s.Sale_Dt) 'MONTH'
, s.Sales_Need_TYpe_Cd
, s.Sales_Status_Cd
, si.Posted
, s.location_Cd
, jan2011 = (
select SUM(si.amt_pd)amt_pd
from [HMIS_REPORTING].HMIS_RPT_ME.dbo.item i
where i.Item_Id = si.Product_Item_ID
and i.Item_Cd <> '*INT'
and convert(varchar(10),SI.Sales_Item_Dt,126) >= '2011-01-01'
and convert(varchar(10),SI.Sales_Item_Dt,126) >= '2011-01-31'
) INTO dbo.#a_acomparision
FROM [HMIS_REPORTING].HMIS_RPT_ME.dbo.Sales S
, [HMIS_REPORTING].HMIS_RPT_ME.dbo.Sales_Item SI
WHERE SI.Sales_Id = S.Sales_Id
and s.Sales_Contract_Nbr in (
select distinct (Sales_Contract_Nbr)
from mountainviewContracts
where Sales_Contract_Nbr <> '')
GROUP BY s.sales_Contract_Nbr
, s.Sales_Id
, s.Sale_Dt
, YEAR
, MONTH
, s.Sales_Need_TYpe_Cd
, s.Sales_Status_Cd
, si.Posted
, s.location_Cd

Related

SQL code is removing duplicate values in error

My SQL code is removing duplicate values of "Time" specific to Project Description. For example, if a time value for a specific project is included two or more times, the data is only pulling the value once skewing the results.
I've tried adding SUM(PMTT_DailyTime.Time) as 'Sum of Time" and this creates a different problem and inaccurate results. It multiplies the sum values by the number of an irrelevant field.
SELECT View_ProjectsInfoDecoded.ProjectNbr
, View_ProjectsInfoDecoded.Department
, View_ProjectsInfoDecoded.ProjectDesc
, View_ProjectsInfoDecoded.ProjectStartDate
, View_ProjectsInfoDecoded.ProjectCompletionDate
, View_ProjectsInfoDecoded.VoidInd
, View_ProjectsInfoDecoded.ProjectStatus
, View_ProjectsInfoDecoded.ProjectType
, DatePart("yyyy", PMTT_DailyTime.ReportDate) AS [ReportYear]
, PMTT_DailyTime.Time
, PMTT_DailyTime.VoidInd
, View_ProjectsBuilderInfoDecoded.ProjectHealth
, View_ProjectsBuilderInfoDecoded.PrimaryBuilder
, View_ProjectsBuilderInfoDecoded.CurrentProjectStatus
FROM View_ProjectsInfoDecoded
LEFT JOIN View_ProjectsBuilderInfoDecoded ON View_ProjectsInfoDecoded.Department = View_ProjectsBuilderInfoDecoded.Department AND View_ProjectsInfoDecoded.ProjectNbr = View_ProjectsBuilderInfoDecoded.ProjectNbr
LEFT JOIN PMTT_DailyTime ON (View_ProjectsBuilderInfoDecoded.Department = PMTT_DailyTime.Department) AND (View_ProjectsBuilderInfoDecoded.ProjectNbr= PMTT_DailyTime.ProjectNbr)
WHERE (View_ProjectsInfoDecoded.Department IN ('107'))
And (View_ProjectsInfoDecoded.ProjectStatus <>'Cancel')
And (dbo.View_ProjectsInfoDecoded.VoidInd = 'N' OR dbo.View_ProjectsInfoDecoded.VoidInd IS NULL)
AND (PMTT_DailyTime.VoidInd = 'N' OR PMTT_DailyTime.VoidInd IS NULL)
AND ((DATEDIFF(MONTH, View_ProjectsInfoDecoded.ProjectCompletionDate,GETDATE()) <= 12) OR (View_ProjectsInfoDecoded.ProjectCompletionDate IS NULL) OR (View_ProjectsInfoDecoded.ProjectCompletionDate='' ))
GROUP BY View_ProjectsInfoDecoded.Department, View_ProjectsInfoDecoded.ProjectNbr
, View_ProjectsInfoDecoded.ProjectDesc, View_ProjectsInfoDecoded.ProjectStatus
, View_ProjectsInfoDecoded.EstStartDate, View_ProjectsInfoDecoded.ProjectStartDate
, View_ProjectsInfoDecoded.ProjectCompletionDate, View_ProjectsInfoDecoded.Complexity
, View_ProjectsInfoDecoded.ProjectType, View_ProjectsInfoDecoded.VoidInd
, View_ProjectsBuilderInfoDecoded.ProjectHealth, View_ProjectsBuilderInfoDecoded.PrimaryBuilder
, View_ProjectsBuilderInfoDecoded.CurrentProjectStatus, PMTT_DailyTime.VoidInd
, DatePart("yyyy", PMTT_DailyTime.ReportDate), PMTT_DailyTime.Time
I think this is an easy fix in the Group function or the type of joins used. But not sure...
With a re-factoring of your query to use aliases, include line breaks and re-order columns, you will notice you have two additional GROUP BY fields that are not included in SELECT: EstStartDate and p.Complexity. As a result, the SELECT columns may show repeated values over the distinct groupings of these omitted two fields.
For a more readable aggregate query, consider including the same columns in GROUP BY also in SELECT clause without omitting any. Do note: per SQL standard, you cannot have a column in SELECT that does not appear in GROUP BY. However, the reverse as your query does is valid. Alternatively, simply run the analogous SELECT DISTINCT without GROUP BY.
SELECT p.ProjectNbr, p.Department, p.ProjectDesc, p.ProjectStartDate, p.ProjectCompletionDate,
p.VoidInd, p.ProjectStatus, p.ProjectType, DatePart("yyyy", d.ReportDate) AS [ReportYear],
d.Time, d.VoidInd, b.ProjectHealth, b.PrimaryBuilder, b.CurrentProjectStatus
FROM (View_ProjectsInfoDecoded p
LEFT JOIN View_ProjectsBuilderInfoDecoded b
ON (p.Department = b.Department) AND (p.ProjectNbr = b.ProjectNbr))
LEFT JOIN PMTT_DailyTime d
ON (b.Department = d.Department) AND (b.ProjectNbr = d.ProjectNbr)
WHERE (p.Department IN ('107'))
AND (p.ProjectStatus <> 'Cancel')
AND (dbo.p.VoidInd = 'N' OR dbo.p.VoidInd IS NULL)
AND (d.VoidInd = 'N' OR d.VoidInd IS NULL)
AND ((DATEDIFF(MONTH, p.ProjectCompletionDate, GETDATE()) <= 12)
OR (p.ProjectCompletionDate IS NULL)
OR (p.ProjectCompletionDate='')
)
GROUP BY p.ProjectNbr, p.Department, p.ProjectDesc, p.ProjectStartDate, p.ProjectCompletionDate,
p.VoidInd, p.ProjectStatus, p.ProjectType, DatePart("yyyy", d.ReportDate),
d.Time, d.VoidInd, b.ProjectHealth, b.PrimaryBuilder, b.CurrentProjectStatus,
p.EstStartDate, p.Complexity -- ADDITIONAL NON-SELECT FIELDS
You are using the Group By clause, but have no aggregate function in your Select statement. This results in the same behavior as using Select Distinct. Removing the Group By clause will include the duplicate records you seem to be looking for.
SELECT View_ProjectsInfoDecoded.ProjectNbr, View_ProjectsInfoDecoded.Department, View_ProjectsInfoDecoded.ProjectDesc, View_ProjectsInfoDecoded.ProjectStartDate, View_ProjectsInfoDecoded.ProjectCompletionDate, View_ProjectsInfoDecoded.VoidInd, View_ProjectsInfoDecoded.ProjectStatus, View_ProjectsInfoDecoded.ProjectType, DatePart("yyyy", PMTT_DailyTime.ReportDate) AS [ReportYear], PMTT_DailyTime.Time, PMTT_DailyTime.VoidInd, View_ProjectsBuilderInfoDecoded.ProjectHealth, View_ProjectsBuilderInfoDecoded.PrimaryBuilder, View_ProjectsBuilderInfoDecoded.CurrentProjectStatus
FROM (View_ProjectsInfoDecoded LEFT JOIN View_ProjectsBuilderInfoDecoded ON (View_ProjectsInfoDecoded.Department = View_ProjectsBuilderInfoDecoded.Department) AND (View_ProjectsInfoDecoded.ProjectNbr = View_ProjectsBuilderInfoDecoded.ProjectNbr)) LEFT JOIN
PMTT_DailyTime ON (View_ProjectsBuilderInfoDecoded.Department = PMTT_DailyTime.Department) AND (View_ProjectsBuilderInfoDecoded.ProjectNbr= PMTT_DailyTime.ProjectNbr)
WHERE (View_ProjectsInfoDecoded.Department IN ('107')) And (View_ProjectsInfoDecoded.ProjectStatus <>'Cancel') And
(dbo.View_ProjectsInfoDecoded.VoidInd = 'N' OR dbo.View_ProjectsInfoDecoded.VoidInd IS NULL) AND (PMTT_DailyTime.VoidInd = 'N' OR PMTT_DailyTime.VoidInd IS NULL)
AND ((DATEDIFF(MONTH, View_ProjectsInfoDecoded.ProjectCompletionDate,GETDATE()) <= 12) OR (View_ProjectsInfoDecoded.ProjectCompletionDate IS NULL) OR (View_ProjectsInfoDecoded.ProjectCompletionDate='' ))

Group by - Non-group-by expression in select clause

Sorry you will have to bear with me as i am relativly new to SQL. I am querying an ODBC in EXCEL. At the moment my dataset is massive so i am looking to narrow it down by grouping it by company name and date. not all my columns are calculated fields.I have put the Sum on the two i need adding up. When i try to return the data i get the error of Non-group-by expression in select clause
Please can someone help me out.
SELECT
SopOrder_0.SooOrderNumber
, Company_0.CoaCompanyName
, InvoiceCreditItem_0.InvoiceCreditItemID
, InvoiceCreditItem_0.IciInvoiceApproved
, InvoiceCreditItem_0.InvoiceCreditID
, InvoiceCreditItem_0.CompanySiteID
, InvoiceCreditItem_0.VatID
, InvoiceCreditItem_0.NominalID
, InvoiceCreditItem_0.IciCreatedDate
, Sum(InvoiceCreditItem_0.IciTotalNettValue)
, Sum(InvoiceCreditItem_0.IciVatValue)
FROM
SBS.PUB.Company Company_0
, SBS.PUB.Customer Customer_0
, SBS.PUB.InvoiceCreditItem InvoiceCreditItem_0
, SBS.PUB.SopOrder SopOrder_0
WHERE
SopOrder_0.SopOrderID = InvoiceCreditItem_0.SopOrderID
AND InvoiceCreditItem_0.CompanyID = Customer_0.CompanyID
AND InvoiceCreditItem_0.CompanyID = Company_0.CompanyID
AND (Company_0.CoaCompanyName<>'ATOS')
AND InvoiceCreditItem_0.IciCreatedDate >= ?
GROUP BY
Company_0.CoaCompanyName, InvoiceCreditItem_0.IciCreatedDate

Column ambigously defined in subquery with join on another subquery

I keep getting a column ambiguously defined error when joining two sub queries. However I have defined all my columns properly. I want to get all the data from the first query and add some data to it where available. How can this be fixed?
SELECT
sq2.month,
sq1.PRIMARY_MER_NUM ,
sq1.PRIMARY_EXT_MID ,
sq1.MER_DBA_NAM,
sq1.CLG_NUM,
sq1.ENT_NUM,
sq1.ENT_NAM,
sq1.MER_OPN_DTE,
sq1.MER_CLS_DTE,
sq1.MER_FST_DPST_DTE,
sq1.CLG_NUM ,
sq1.ENT_NUM,
sq2.gross_volume,
sq2.transaction_count
FROM
(SELECT DISTINCT
PRIMARY_MER_NUM ,
PRIMARY_EXT_MID ,
MER_DBA_NAM,
CLG_NUM,
ENT_NUM,
ENT_NAM,
MER_OPN_DTE,
MER_CLS_DTE,
MER_FST_DPST_DTE,
CLG_NUM ,
ENT_NUM
FROM
bi.t_mer_dim_na
WHERE
CLG_NUM = 7
AND ENT_NUM IN ('45810', '45811', '46849', '45948', '45824',
'46911', '45509', '46845', '48902')
) sq1
LEFT JOIN
(SELECT
TRUNC(BAT_REF_DTE, 'MM') AS month,
MER_NUM,
SUM(bat_prd_trn_dr_amt + bat_prd_trn_cr_amt) AS gross_volume,
SUM(bat_item_num) AS transaction_count
FROM
TDS.BAT_T3
WHERE
1 = 1
AND bat_ref_dte >= TRUNC(sysdate, 'MM')
GROUP BY
TRUNC(BAT_REF_DTE, 'MM'), MER_NUM) SQ2 ON sq1.primary_mer_num = sq2.MER_NUM;
You have CLG_NUM and ENT_NUM selected twice in your first derived table SQL1
FROM (
select DISTINCT
PRIMARY_MER_NUM ,
PRIMARY_EXT_MID ,
MER_DBA_NAM,
CLG_NUM, --1
ENT_NUM, --1
ENT_NAM,
MER_OPN_DTE,
MER_CLS_DTE,
MER_FST_DPST_DTE,
CLG_NUM, --2
ENT_NUM --2
from bi.t_mer_dim_na
That makes selecting sql1.CLG_NUM and sql1.ENT_NUM ambiguous in your outer select (where you also select both twice)

Use of MAX function in SQL query to filter data

The code below joins two tables and I need to extract only the latest date per account, though it holds multiple accounts and history records. I wanted to use the MAX function, but not sure how to incorporate it for this case. I am using My SQL server.
Appreciate any help !
select
PROP.FileName,PROP.InsName, PROP.Status,
PROP.FileTime, PROP.SubmissionNo, PROP.PolNo,
PROP.EffDate,PROP.ExpDate, PROP.Region,
PROP.Underwriter, PROP_DATA.Data , PROP_DATA.Label
from
Property.dbo.PROP
inner join
Property.dbo.PROP_DATA on Property.dbo.PROP.FileID = Actuarial.dbo.PROP_DATA.FileID
where
(PROP_DATA.Label in ('Occupancy' , 'OccupancyTIV'))
and (PROP.EffDate >= '42278' and PROP.EffDate <= '42643')
and (PROP.Status = 'Bound')
and (Prop.FileTime = Max(Prop.FileTime))
order by
PROP.EffDate DESC
Assuming your DBMS supports windowing functions and the with clause, a max windowing function would work:
with all_data as (
select
PROP.FileName,PROP.InsName, PROP.Status,
PROP.FileTime, PROP.SubmissionNo, PROP.PolNo,
PROP.EffDate,PROP.ExpDate, PROP.Region,
PROP.Underwriter, PROP_DATA.Data , PROP_DATA.Label,
max (PROP.EffDate) over (partition by PROP.PolNo) as max_date
from Actuarial.dbo.PROP
inner join Actuarial.dbo.PROP_DATA
on Actuarial.dbo.PROP.FileID = Actuarial.dbo.PROP_DATA.FileID
where (PROP_DATA.Label in ('Occupancy' , 'OccupancyTIV'))
and (PROP.EffDate >= '42278' and PROP.EffDate <= '42643')
and (PROP.Status = 'Bound')
and (Prop.FileTime = Max(Prop.FileTime))
)
select
FileName, InsName, Status, FileTime, SubmissionNo,
PolNo, EffDate, ExpDate, Region, UnderWriter, Data, Label
from all_data
where EffDate = max_date
ORDER BY EffDate DESC
This also presupposes than any given account would not have two records on the same EffDate. If that's the case, and there is no other objective means to determine the latest account, you could also use row_numer to pick a somewhat arbitrary record in the case of a tie.
Using straight SQL, you can use a self-join in a subquery in your where clause to eliminate values smaller than the max, or smaller than the top n largest, and so on. Just set the number in <= 1 to the number of top values you want per group.
Something like the following might do the trick, for example:
select
p.FileName
, p.InsName
, p.Status
, p.FileTime
, p.SubmissionNo
, p.PolNo
, p.EffDate
, p.ExpDate
, p.Region
, p.Underwriter
, pd.Data
, pd.Label
from Actuarial.dbo.PROP p
inner join Actuarial.dbo.PROP_DATA pd
on p.FileID = pd.FileID
where (
select count(*)
from Actuarial.dbo.PROP p2
where p2.FileID = p.FileID
and p2.EffDate <= p.EffDate
) <= 1
and (
pd.Label in ('Occupancy' , 'OccupancyTIV')
and p.Status = 'Bound'
)
ORDER BY p.EffDate DESC
Have a look at this stackoverflow question for a full working example.
Not tested
with temp1 as
(
select foo
from bar
whre xy = MAX(xy)
)
select PROP.FileName,PROP.InsName, PROP.Status,
PROP.FileTime, PROP.SubmissionNo, PROP.PolNo,
PROP.EffDate,PROP.ExpDate, PROP.Region,
PROP.Underwriter, PROP_DATA.Data , PROP_DATA.Label
from Actuarial.dbo.PROP
inner join temp1 t
on Actuarial.dbo.PROP.FileID = t.dbo.PROP_DATA.FileID
ORDER BY PROP.EffDate DESC

How can I grab the record set from a JOIN with the most recent datestamp?

I am writing a stored procedure that is grabbing some data from 3 tables. Right now, my output looks like this:
Rig 20 is listed twice. I would like to grab only the record with the most recent datestamp. So my query now looks like this:
SELECT
robinson_Rigs.rigId
, robinson_Rigs.rigName
, robinson_Clients.companyName
, robinson_Wells.wellName
, robinson_Wells.county
, max(robinson_Wells.startDate)
, robinson_Wells.directions
FROM robinson_Wells
JOIN robinson_Rigs ON robinson_wells.rigId = robinson_Rigs.rigId
JOIN robinson_Clients on robinson_Wells.clientId = robinson_Clients.clientId
group by robinson_Rigs.rigId
ORDER BY robinson_Rigs.rigId
But I am getting this error:
Msg 8120, Level 16, State 1, Procedure robinson_GetAllDrivingDirections, Line 14
Column 'robinson_Rigs.rigName' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
How can I achieve this?
Simply grouping on everything but the StartDate will still return multiple rows for a rig if any other field contains a different value for the rig.
Instead, try something like this:
SELECT
robinson_Rigs.rigId
, robinson_Rigs.rigName
, robinson_Clients.companyName
, robinson_Wells.wellName
, robinson_Wells.county
, robinson_Wells.startDate
, robinson_Wells.directions
FROM robinson_Wells
JOIN robinson_Rigs ON robinson_wells.rigId = robinson_Rigs.rigId
JOIN
(
SELECT rigId
, MAX(startDate) AS MostRecentDate
FROM robinson_rigs
GROUP BY rigId
) latestRigDate ON robinson_Rigs.RigId = latestRigDate.RigId
AND robinson_rigs.StartDate = latestRigDate.MostRecentDate
JOIN robinson_Clients on robinson_Wells.clientId = robinson_Clients.clientId
ORDER BY robinson_Rigs.rigId
The joined subquery will return a list of all rig ids and their max (most recent) date. Joining this to the complete robinson_rigs table by rigId and startDate will "filter" the table so that only records with the most recent date for each rig are returned.
SELECT
robinson_Rigs.rigId
, robinson_Rigs.rigName
, robinson_Clients.companyName
, robinson_Wells.wellName
, robinson_Wells.county
, max(robinson_Wells.startDate)
, robinson_Wells.directions
FROM robinson_Wells
JOIN robinson_Rigs ON robinson_wells.rigId = robinson_Rigs.rigId
JOIN robinson_Clients on robinson_Wells.clientId = robinson_Clients.clientId
group by robinson_Rigs.rigId
, robinson_Rigs.rigName
, robinson_Clients.companyName
, robinson_Wells.wellName
, robinson_Wells.county
ORDER BY robinson_Rigs.rigId
group by everything before the aggregate. SQL does not play nice unless you understand the group by