SQL code is removing duplicate values in error - sql

My SQL code is removing duplicate values of "Time" specific to Project Description. For example, if a time value for a specific project is included two or more times, the data is only pulling the value once skewing the results.
I've tried adding SUM(PMTT_DailyTime.Time) as 'Sum of Time" and this creates a different problem and inaccurate results. It multiplies the sum values by the number of an irrelevant field.
SELECT View_ProjectsInfoDecoded.ProjectNbr
, View_ProjectsInfoDecoded.Department
, View_ProjectsInfoDecoded.ProjectDesc
, View_ProjectsInfoDecoded.ProjectStartDate
, View_ProjectsInfoDecoded.ProjectCompletionDate
, View_ProjectsInfoDecoded.VoidInd
, View_ProjectsInfoDecoded.ProjectStatus
, View_ProjectsInfoDecoded.ProjectType
, DatePart("yyyy", PMTT_DailyTime.ReportDate) AS [ReportYear]
, PMTT_DailyTime.Time
, PMTT_DailyTime.VoidInd
, View_ProjectsBuilderInfoDecoded.ProjectHealth
, View_ProjectsBuilderInfoDecoded.PrimaryBuilder
, View_ProjectsBuilderInfoDecoded.CurrentProjectStatus
FROM View_ProjectsInfoDecoded
LEFT JOIN View_ProjectsBuilderInfoDecoded ON View_ProjectsInfoDecoded.Department = View_ProjectsBuilderInfoDecoded.Department AND View_ProjectsInfoDecoded.ProjectNbr = View_ProjectsBuilderInfoDecoded.ProjectNbr
LEFT JOIN PMTT_DailyTime ON (View_ProjectsBuilderInfoDecoded.Department = PMTT_DailyTime.Department) AND (View_ProjectsBuilderInfoDecoded.ProjectNbr= PMTT_DailyTime.ProjectNbr)
WHERE (View_ProjectsInfoDecoded.Department IN ('107'))
And (View_ProjectsInfoDecoded.ProjectStatus <>'Cancel')
And (dbo.View_ProjectsInfoDecoded.VoidInd = 'N' OR dbo.View_ProjectsInfoDecoded.VoidInd IS NULL)
AND (PMTT_DailyTime.VoidInd = 'N' OR PMTT_DailyTime.VoidInd IS NULL)
AND ((DATEDIFF(MONTH, View_ProjectsInfoDecoded.ProjectCompletionDate,GETDATE()) <= 12) OR (View_ProjectsInfoDecoded.ProjectCompletionDate IS NULL) OR (View_ProjectsInfoDecoded.ProjectCompletionDate='' ))
GROUP BY View_ProjectsInfoDecoded.Department, View_ProjectsInfoDecoded.ProjectNbr
, View_ProjectsInfoDecoded.ProjectDesc, View_ProjectsInfoDecoded.ProjectStatus
, View_ProjectsInfoDecoded.EstStartDate, View_ProjectsInfoDecoded.ProjectStartDate
, View_ProjectsInfoDecoded.ProjectCompletionDate, View_ProjectsInfoDecoded.Complexity
, View_ProjectsInfoDecoded.ProjectType, View_ProjectsInfoDecoded.VoidInd
, View_ProjectsBuilderInfoDecoded.ProjectHealth, View_ProjectsBuilderInfoDecoded.PrimaryBuilder
, View_ProjectsBuilderInfoDecoded.CurrentProjectStatus, PMTT_DailyTime.VoidInd
, DatePart("yyyy", PMTT_DailyTime.ReportDate), PMTT_DailyTime.Time
I think this is an easy fix in the Group function or the type of joins used. But not sure...

With a re-factoring of your query to use aliases, include line breaks and re-order columns, you will notice you have two additional GROUP BY fields that are not included in SELECT: EstStartDate and p.Complexity. As a result, the SELECT columns may show repeated values over the distinct groupings of these omitted two fields.
For a more readable aggregate query, consider including the same columns in GROUP BY also in SELECT clause without omitting any. Do note: per SQL standard, you cannot have a column in SELECT that does not appear in GROUP BY. However, the reverse as your query does is valid. Alternatively, simply run the analogous SELECT DISTINCT without GROUP BY.
SELECT p.ProjectNbr, p.Department, p.ProjectDesc, p.ProjectStartDate, p.ProjectCompletionDate,
p.VoidInd, p.ProjectStatus, p.ProjectType, DatePart("yyyy", d.ReportDate) AS [ReportYear],
d.Time, d.VoidInd, b.ProjectHealth, b.PrimaryBuilder, b.CurrentProjectStatus
FROM (View_ProjectsInfoDecoded p
LEFT JOIN View_ProjectsBuilderInfoDecoded b
ON (p.Department = b.Department) AND (p.ProjectNbr = b.ProjectNbr))
LEFT JOIN PMTT_DailyTime d
ON (b.Department = d.Department) AND (b.ProjectNbr = d.ProjectNbr)
WHERE (p.Department IN ('107'))
AND (p.ProjectStatus <> 'Cancel')
AND (dbo.p.VoidInd = 'N' OR dbo.p.VoidInd IS NULL)
AND (d.VoidInd = 'N' OR d.VoidInd IS NULL)
AND ((DATEDIFF(MONTH, p.ProjectCompletionDate, GETDATE()) <= 12)
OR (p.ProjectCompletionDate IS NULL)
OR (p.ProjectCompletionDate='')
)
GROUP BY p.ProjectNbr, p.Department, p.ProjectDesc, p.ProjectStartDate, p.ProjectCompletionDate,
p.VoidInd, p.ProjectStatus, p.ProjectType, DatePart("yyyy", d.ReportDate),
d.Time, d.VoidInd, b.ProjectHealth, b.PrimaryBuilder, b.CurrentProjectStatus,
p.EstStartDate, p.Complexity -- ADDITIONAL NON-SELECT FIELDS

You are using the Group By clause, but have no aggregate function in your Select statement. This results in the same behavior as using Select Distinct. Removing the Group By clause will include the duplicate records you seem to be looking for.
SELECT View_ProjectsInfoDecoded.ProjectNbr, View_ProjectsInfoDecoded.Department, View_ProjectsInfoDecoded.ProjectDesc, View_ProjectsInfoDecoded.ProjectStartDate, View_ProjectsInfoDecoded.ProjectCompletionDate, View_ProjectsInfoDecoded.VoidInd, View_ProjectsInfoDecoded.ProjectStatus, View_ProjectsInfoDecoded.ProjectType, DatePart("yyyy", PMTT_DailyTime.ReportDate) AS [ReportYear], PMTT_DailyTime.Time, PMTT_DailyTime.VoidInd, View_ProjectsBuilderInfoDecoded.ProjectHealth, View_ProjectsBuilderInfoDecoded.PrimaryBuilder, View_ProjectsBuilderInfoDecoded.CurrentProjectStatus
FROM (View_ProjectsInfoDecoded LEFT JOIN View_ProjectsBuilderInfoDecoded ON (View_ProjectsInfoDecoded.Department = View_ProjectsBuilderInfoDecoded.Department) AND (View_ProjectsInfoDecoded.ProjectNbr = View_ProjectsBuilderInfoDecoded.ProjectNbr)) LEFT JOIN
PMTT_DailyTime ON (View_ProjectsBuilderInfoDecoded.Department = PMTT_DailyTime.Department) AND (View_ProjectsBuilderInfoDecoded.ProjectNbr= PMTT_DailyTime.ProjectNbr)
WHERE (View_ProjectsInfoDecoded.Department IN ('107')) And (View_ProjectsInfoDecoded.ProjectStatus <>'Cancel') And
(dbo.View_ProjectsInfoDecoded.VoidInd = 'N' OR dbo.View_ProjectsInfoDecoded.VoidInd IS NULL) AND (PMTT_DailyTime.VoidInd = 'N' OR PMTT_DailyTime.VoidInd IS NULL)
AND ((DATEDIFF(MONTH, View_ProjectsInfoDecoded.ProjectCompletionDate,GETDATE()) <= 12) OR (View_ProjectsInfoDecoded.ProjectCompletionDate IS NULL) OR (View_ProjectsInfoDecoded.ProjectCompletionDate='' ))

Related

SQL joining a separate query as a column in original query

I am struggling with joining the below two queries.
My main query is the first one below and what I am trying to achieve is the output of query 2's opt in rate column as a new column in my original query.
SELECT CAST("public"."event_event"."date_created" AS date) AS "date_created", "marketing_message__via__messag"."name" AS "name", count(*) AS "count"
FROM "public"."event_event"
LEFT JOIN "public"."marketing_message" "marketing_message__via__messag" ON "public"."event_event"."message_id" = "marketing_message__via__messag"."id" LEFT JOIN "public"."marketing_campaign" "marketing_campaign__via__campa" ON "public"."event_event"."campaign_id" = "marketing_campaign__via__campa"."id"
WHERE (date_trunc('month', CAST("public"."event_event"."date_created" AS timestamp)) = date_trunc('month', CAST(now() AS timestamp))
AND "marketing_message__via__messag"."name" IS NOT NULL AND ("marketing_message__via__messag"."name" <> ''
OR "marketing_message__via__messag"."name" IS NULL) AND "public"."event_event"."stage" = 'Lead' AND ("marketing_campaign__via__campa"."name" = 'a'
OR "marketing_campaign__via__campa"."name" = 'b'
OR "marketing_campaign__via__campa"."name" = 'c'
OR "marketing_campaign__via__campa"."name" = 'c1' OR "marketing_campaign__via__campa"."name" = 'd'
OR "marketing_campaign__via__campa"."name" = 'e'))
GROUP BY CAST("public"."event_event"."date_created" AS date), "marketing_message__via__messag"."name"
ORDER BY CAST("public"."event_event"."date_created" AS date) ASC, "marketing_message__via__messag"."name" ASC
I would like to add the following query output for "Opt-In rate" to my query above as a new column.
Query 2
SELECT marketing_message.message_text,
cast(sum((event_event.status='Opt-in')::int) as decimal) / nullif(sum((event_event.status='Sent')::int), 0)* 100 as "Opt-in Rate (Sent)"
FROM event_event
JOIN marketing_campaign ON event_event.campaign_id = marketing_campaign.id
JOIN marketing_message ON event_event.message_id = marketing_message.id
WHERE True [[AND {{campaign_name}}]] [[AND {{date_created}}]]
GROUP BY marketing_message.message_text
The short answer is that you can't.
Your first query is keyed on date and name (the GROUP BY clause) and your second query is keyed on message_text. As there is no relationship between the two datasets there is no way of joining/combining them in a single query.
You would need to find a common field (or fields) between the two datasets and join on them - but this won't give the same results that you have at the moment as your queries would need to be completely restructured.

Column ambigously defined in subquery with join on another subquery

I keep getting a column ambiguously defined error when joining two sub queries. However I have defined all my columns properly. I want to get all the data from the first query and add some data to it where available. How can this be fixed?
SELECT
sq2.month,
sq1.PRIMARY_MER_NUM ,
sq1.PRIMARY_EXT_MID ,
sq1.MER_DBA_NAM,
sq1.CLG_NUM,
sq1.ENT_NUM,
sq1.ENT_NAM,
sq1.MER_OPN_DTE,
sq1.MER_CLS_DTE,
sq1.MER_FST_DPST_DTE,
sq1.CLG_NUM ,
sq1.ENT_NUM,
sq2.gross_volume,
sq2.transaction_count
FROM
(SELECT DISTINCT
PRIMARY_MER_NUM ,
PRIMARY_EXT_MID ,
MER_DBA_NAM,
CLG_NUM,
ENT_NUM,
ENT_NAM,
MER_OPN_DTE,
MER_CLS_DTE,
MER_FST_DPST_DTE,
CLG_NUM ,
ENT_NUM
FROM
bi.t_mer_dim_na
WHERE
CLG_NUM = 7
AND ENT_NUM IN ('45810', '45811', '46849', '45948', '45824',
'46911', '45509', '46845', '48902')
) sq1
LEFT JOIN
(SELECT
TRUNC(BAT_REF_DTE, 'MM') AS month,
MER_NUM,
SUM(bat_prd_trn_dr_amt + bat_prd_trn_cr_amt) AS gross_volume,
SUM(bat_item_num) AS transaction_count
FROM
TDS.BAT_T3
WHERE
1 = 1
AND bat_ref_dte >= TRUNC(sysdate, 'MM')
GROUP BY
TRUNC(BAT_REF_DTE, 'MM'), MER_NUM) SQ2 ON sq1.primary_mer_num = sq2.MER_NUM;
You have CLG_NUM and ENT_NUM selected twice in your first derived table SQL1
FROM (
select DISTINCT
PRIMARY_MER_NUM ,
PRIMARY_EXT_MID ,
MER_DBA_NAM,
CLG_NUM, --1
ENT_NUM, --1
ENT_NAM,
MER_OPN_DTE,
MER_CLS_DTE,
MER_FST_DPST_DTE,
CLG_NUM, --2
ENT_NUM --2
from bi.t_mer_dim_na
That makes selecting sql1.CLG_NUM and sql1.ENT_NUM ambiguous in your outer select (where you also select both twice)

Calculate a field based on 2 calculated colums in SQL

I have a query that sums a couple of values based on a common identifier known as a workcell. I'm trying to figure out how to add a column that calculates this formula as a percentage: (SumOfAct - SumOfStd) / (SumOfStd)
I was thinking some kind of subquery with inner joins would work, but I'm not sure how to get it looking right.
Here is my code that gets everything I want except for that calculated column:
SELECT v_MES_OrderIssues.AssignedWorkcell
, CONVERT(Decimal(10,2), Sum(v_SAP_OrderOperations.Std)) AS SumOfStd
, CONVERT(Decimal(10,2), Sum(v_SAP_OrderOperations.Act)) AS SumOfAct
, CONVERT(Decimal(10,2), Sum(v_SAP_OrderOperations.Variance)) AS SumOfVariance
FROM (v_SAP_OrderOperations
LEFT JOIN v_SAP_Orders ON v_SAP_OrderOperations.Ordr = v_SAP_Orders.Ordr)
LEFT JOIN v_MES_OrderIssues ON v_SAP_OrderOperations.Ordr = v_MES_OrderIssues.WOrder
WHERE (((v_SAP_Orders.OpenOrder) Like '1')
AND ((v_SAP_Orders.Equipment) Is Not NULL)
AND ((v_SAP_OrderOperations.ACT)>0))
AND ((v_MES_OrderIssues.AssignedWorkcell) Like 'S5H%W')
AND ((v_MES_OrderIssues.DateTimeClosed) Is Null)
OR (((v_SAP_Orders.OpenOrder) Like '1')
AND ((v_SAP_Orders.Equipment) Is Not NULL)
AND ((v_SAP_OrderOperations.OpenOp) Like '0'))
AND ((v_MES_OrderIssues.AssignedWorkcell) Like 'S5H%W')
AND ((v_MES_OrderIssues.DateTimeClosed) Is Null)
GROUP BY v_MES_OrderIssues.AssignedWorkcell
ORDER BY Sum(v_SAP_OrderOperations.Variance) DESC
If I got it right yo can do it directly in SELECT clause
SELECT v_MES_OrderIssues.AssignedWorkcell
, CONVERT(Decimal(10,2), Sum(v_SAP_OrderOperations.Std)) AS SumOfStd
, CONVERT(Decimal(10,2), Sum(v_SAP_OrderOperations.Act)) AS SumOfAct
, CONVERT(Decimal(10,2), Sum(v_SAP_OrderOperations.Variance)) AS SumOfVariance
, CONVERT(Decimal(10,2), (Sum(v_SAP_OrderOperations.Act) - Sum(v_SAP_OrderOperations.Std))/ Sum(v_SAP_OrderOperations.Std)) AS percentage
...
BTW,
LEFT JOIN v_SAP_Orders
...
WHERE (((v_SAP_Orders.OpenOrder) Like '1')
will be INNER JOIN really as a column from left-joined table expression is thus prohibited to have NULL value. You may wish to move the predicate to ON clause to keep it left-joined.

Use of MAX function in SQL query to filter data

The code below joins two tables and I need to extract only the latest date per account, though it holds multiple accounts and history records. I wanted to use the MAX function, but not sure how to incorporate it for this case. I am using My SQL server.
Appreciate any help !
select
PROP.FileName,PROP.InsName, PROP.Status,
PROP.FileTime, PROP.SubmissionNo, PROP.PolNo,
PROP.EffDate,PROP.ExpDate, PROP.Region,
PROP.Underwriter, PROP_DATA.Data , PROP_DATA.Label
from
Property.dbo.PROP
inner join
Property.dbo.PROP_DATA on Property.dbo.PROP.FileID = Actuarial.dbo.PROP_DATA.FileID
where
(PROP_DATA.Label in ('Occupancy' , 'OccupancyTIV'))
and (PROP.EffDate >= '42278' and PROP.EffDate <= '42643')
and (PROP.Status = 'Bound')
and (Prop.FileTime = Max(Prop.FileTime))
order by
PROP.EffDate DESC
Assuming your DBMS supports windowing functions and the with clause, a max windowing function would work:
with all_data as (
select
PROP.FileName,PROP.InsName, PROP.Status,
PROP.FileTime, PROP.SubmissionNo, PROP.PolNo,
PROP.EffDate,PROP.ExpDate, PROP.Region,
PROP.Underwriter, PROP_DATA.Data , PROP_DATA.Label,
max (PROP.EffDate) over (partition by PROP.PolNo) as max_date
from Actuarial.dbo.PROP
inner join Actuarial.dbo.PROP_DATA
on Actuarial.dbo.PROP.FileID = Actuarial.dbo.PROP_DATA.FileID
where (PROP_DATA.Label in ('Occupancy' , 'OccupancyTIV'))
and (PROP.EffDate >= '42278' and PROP.EffDate <= '42643')
and (PROP.Status = 'Bound')
and (Prop.FileTime = Max(Prop.FileTime))
)
select
FileName, InsName, Status, FileTime, SubmissionNo,
PolNo, EffDate, ExpDate, Region, UnderWriter, Data, Label
from all_data
where EffDate = max_date
ORDER BY EffDate DESC
This also presupposes than any given account would not have two records on the same EffDate. If that's the case, and there is no other objective means to determine the latest account, you could also use row_numer to pick a somewhat arbitrary record in the case of a tie.
Using straight SQL, you can use a self-join in a subquery in your where clause to eliminate values smaller than the max, or smaller than the top n largest, and so on. Just set the number in <= 1 to the number of top values you want per group.
Something like the following might do the trick, for example:
select
p.FileName
, p.InsName
, p.Status
, p.FileTime
, p.SubmissionNo
, p.PolNo
, p.EffDate
, p.ExpDate
, p.Region
, p.Underwriter
, pd.Data
, pd.Label
from Actuarial.dbo.PROP p
inner join Actuarial.dbo.PROP_DATA pd
on p.FileID = pd.FileID
where (
select count(*)
from Actuarial.dbo.PROP p2
where p2.FileID = p.FileID
and p2.EffDate <= p.EffDate
) <= 1
and (
pd.Label in ('Occupancy' , 'OccupancyTIV')
and p.Status = 'Bound'
)
ORDER BY p.EffDate DESC
Have a look at this stackoverflow question for a full working example.
Not tested
with temp1 as
(
select foo
from bar
whre xy = MAX(xy)
)
select PROP.FileName,PROP.InsName, PROP.Status,
PROP.FileTime, PROP.SubmissionNo, PROP.PolNo,
PROP.EffDate,PROP.ExpDate, PROP.Region,
PROP.Underwriter, PROP_DATA.Data , PROP_DATA.Label
from Actuarial.dbo.PROP
inner join temp1 t
on Actuarial.dbo.PROP.FileID = t.dbo.PROP_DATA.FileID
ORDER BY PROP.EffDate DESC

Why I need Group by in this simple query?

UPDATE :
-----
the error might be in sum(si.amt_pd) from item table (as there is no relation) :
select SUM(si.amt_pd)amt_pd from [HMIS_REPORTING].HMIS_RPT_ME.dbo.item i
where
is there a work around?
----------
I am trying to run this query. The query just fetches the amount of a month based on some tables. It is just a part of a big query.
select s.sales_Contract_Nbr
, s.Sales_Id
, s.Sale_Dt
, YEAR(s.Sale_Dt) 'YEAR'
, MONTH(s.Sale_Dt) 'MONTH'
, s.Sales_Need_TYpe_Cd
, s.Sales_Status_Cd
, si.Posted
, s.location_Cd
, jan2011 = (
select SUM(si.amt_pd)amt_pd
from [HMIS_REPORTING].HMIS_RPT_ME.dbo.item i
where i.Item_Id = si.Product_Item_ID
and i.Item_Cd <> '*INT'
and convert(varchar(10),SI.Sales_Item_Dt,126) >= '2011-01-01'
and convert(varchar(10),SI.Sales_Item_Dt,126) >= '2011-01-31'
) INTO dbo.#a_acomparision
FROM [HMIS_REPORTING].HMIS_RPT_ME.dbo.Sales S
, [HMIS_REPORTING].HMIS_RPT_ME.dbo.Sales_Item SI
WHERE SI.Sales_Id = S.Sales_Id
and s.Sales_Contract_Nbr in (
select distinct (Sales_Contract_Nbr)
from mountainviewContracts
where Sales_Contract_Nbr <> '')
but I am getting the following error message.
Msg 8120, Level 16, State 1, Line 1
Column 'HMIS_REPORTING.HMIS_RPT_ME.dbo.Sales.Sales_Contract_Nbr' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
I just can't understand why my query should have a group by for sales_contract_nbr and even if I put in the group by clause it tells me that inner query si.Product_item_id and SI.sales_item_dt should also be contained in group by clause.
Please help me out.
Thanks in advance
This is a very subtle problem. However, I think the subquery should be:
select SUM(i.amt_pd)amt_pd from [HMIS_REPORTING].HMIS_RPT_ME.dbo.item i
That is, the alias should be i not si.
What is happening is that the sum in the subquery is on a value in the outer query. So, the SQL compiler assumes an aggregation query. As soon as the first column is found that is not an aggregation, it complains with the message that you have.
By the way, you should use proper join syntax, so you from clause looks like:
FROM [HMIS_REPORTING].HMIS_RPT_ME.dbo.Sales S join
[HMIS_REPORTING].HMIS_RPT_ME.dbo.Sales_Item SI
on SI.Sales_Id = S.Sales_Id
As #Gordon Linoff says, this is almost certainly because the query optimizer is treating this like a SUM operation, normalizing away the subquery for "jan2001".
If the amt_pd column is present in the ITEM table, Gordon's solution is the right one.
If not, you have to add the group by statement, as below.
select s.sales_Contract_Nbr
, s.Sales_Id
, s.Sale_Dt
, YEAR(s.Sale_Dt) 'YEAR'
, MONTH(s.Sale_Dt) 'MONTH'
, s.Sales_Need_TYpe_Cd
, s.Sales_Status_Cd
, si.Posted
, s.location_Cd
, jan2011 = (
select SUM(si.amt_pd)amt_pd
from [HMIS_REPORTING].HMIS_RPT_ME.dbo.item i
where i.Item_Id = si.Product_Item_ID
and i.Item_Cd <> '*INT'
and convert(varchar(10),SI.Sales_Item_Dt,126) >= '2011-01-01'
and convert(varchar(10),SI.Sales_Item_Dt,126) >= '2011-01-31'
) INTO dbo.#a_acomparision
FROM [HMIS_REPORTING].HMIS_RPT_ME.dbo.Sales S
, [HMIS_REPORTING].HMIS_RPT_ME.dbo.Sales_Item SI
WHERE SI.Sales_Id = S.Sales_Id
and s.Sales_Contract_Nbr in (
select distinct (Sales_Contract_Nbr)
from mountainviewContracts
where Sales_Contract_Nbr <> '')
GROUP BY s.sales_Contract_Nbr
, s.Sales_Id
, s.Sale_Dt
, YEAR
, MONTH
, s.Sales_Need_TYpe_Cd
, s.Sales_Status_Cd
, si.Posted
, s.location_Cd