Inline Table Join Multiplying Results

Inline Table Join Multiplying Results - sql

The below query joins two views and one inline table to another inline table. When I run the query without table FI all of the SUM values return correctly, however, when I run the query with table FI all of the SUM values from vw_Interactions are multiplied and returned incorrect (SUM values from vw_LeadInteractions are not affected).
vw_Interactions is a transactional log and returns a 1 in each column where that measure is true (ex: a 1 is returned in I.[Call] where a phone call was logged), and vw_LeadInteractions is the same except it returns the Client's ID.
I did several hours of research and found that inline tables can cause issues when joining (the Cartesian product?), however I wasn't able to understand how those answers were relevant to this query.
Can someone explain why that when table FI is included in this query that it multiplies the SUM values of everything from vw_Interactions? And then how do I fix my query so this does not happen?
This query is for my employer's outbound call center to measure what's happening during each 'round' of calling.
/* Parameters */
DECLARE #StartDatetime AS Date
SET #StartDatetime = '06/01/13'
DECLARE #EndDatetime AS Date
SET #EndDatetime = '05/31/14'
/* Dataset */
SELECT R.[RoundsGoal]
,R.[RoundNumber]
,COUNT(DISTINCT R.[Client_Id]) AS 'Leads'
,ISNULL(SUM(I.[Call]), 0) AS 'Calls'
,ISNULL(COUNT(DISTINCT LI.[Call]), 0) AS 'CallLeads'
,ISNULL(SUM(FI.[FirstCall]), 0) AS 'FirstCalls'
,ISNULL(SUM(I.[DecisionMakerCall]), 0) AS 'DecisionMakerCalls'
,ISNULL(COUNT(DISTINCT LI.[DecisionMakerCall]), 0) AS 'DecisionMakerCallLeads'
,ISNULL(SUM(FI.[FirstDecisionMakerCall]), 0) AS 'FirstDecisionMakerCalls'
,ISNULL(SUM( I.[LeftMessageCall]), 0) AS 'LeftMessageCalls'
,ISNULL(COUNT(DISTINCT LI.[LeftMessageCall]), 0) AS 'LeftMessageLeads'
,ISNULL(SUM(FI.[FirstLeftMessageCall]), 0) AS 'FirstLeftMessageCalls'
,ISNULL(SUM(I.[NoAnswerCall]), 0) AS 'NoAnswerCalls'
,ISNULL(COUNT(DISTINCT LI.[NoAnswerCall]), 0) AS 'NoAnswerCallLeads'
,ISNULL(SUM(FI.[FirstNoAnswerCall]), 0) AS 'FirstNoAnswerCalls'
FROM (
SELECT RD.[Client_Id]
,ISNULL(UF1.[NumericCol], 0) AS 'RoundsGoal'
,COUNT(RD.[RoundDate]) OVER(PARTITION BY RD.[Client_Id] ORDER BY RD.[RoundDate] ASC) AS 'RoundNumber'
,RD.[RoundDate]
FROM [dbo].[vw_RoundDates] RD
LEFT JOIN [dbo].[AMGR_User_Fields] UF1 ON RD.[Client_Id] = UF1.[Client_Id] AND UF1.[Type_Id] = 140 --Rounds Goal TypeId
LEFT JOIN [dbo].[AMGR_User_Field_Defs] UFD1 ON UF1.[Type_Id] = UFD1.[Type_Id] AND UF1.[Code_Id] = UFD1.[Code_Id]
WHERE RD.[RoundDate] >= #StartDatetime AND RD.[RoundDate] <= #EndDatetime
) R
LEFT JOIN [dbo].[vw_Interactions] I ON R.[Client_Id] = I.[Client_Id] AND R.[RoundDate] = CAST(I.[Created] AS DATE)
LEFT JOIN [dbo].[vw_LeadInteractions] LI ON R.[Client_Id] = LI.[Client_Id] AND R.[RoundDate] = CAST(LI.[Created] AS DATE)
LEFT JOIN (
SELECT I.[Client_Id]
,CASE WHEN (CASE WHEN I.[Call] = 1 THEN ROW_NUMBER() OVER(PARTITION BY I.[Client_Id], I.[Call] ORDER BY I.[Created] ASC) ELSE NULL END) = 1 THEN 1 ELSE NULL END AS 'FirstCall'
,CASE WHEN (CASE WHEN I.[DecisionMakerCall] = 1 THEN ROW_NUMBER() OVER(PARTITION BY I.[Client_Id], I.[DecisionMakerCall] ORDER BY I.[Created] ASC) ELSE NULL END) = 1 THEN 1 ELSE NULL END AS 'FirstDecisionMakerCall'
,CASE WHEN (CASE WHEN I.[LeftMessageCall] = 1 THEN ROW_NUMBER() OVER(PARTITION BY I.[Client_Id], I.[LeftMessageCall] ORDER BY I.[Created] ASC) ELSE NULL END) = 1 THEN 1 ELSE NULL END AS 'FirstLeftMessageCall'
,CASE WHEN (CASE WHEN I.[NoAnswerCall] = 1 THEN ROW_NUMBER() OVER(PARTITION BY I.[Client_Id], I.[NoAnswerCall] ORDER BY I.[Created] ASC) ELSE NULL END) = 1 THEN 1 ELSE NULL END AS 'FirstNoAnswerCall'
,[Created]
FROM [dbo].[vw_Interactions] I
) FI ON R.[Client_Id] = FI.[Client_Id] AND R.[RoundDate] = CAST(FI.[Created] AS DATE)
GROUP BY R.[RoundsGoal]
,R.[RoundNumber]
ORDER BY R.[RoundsGoal] ASC
,R.[RoundNumber] ASC
Here is the correct results set without table FI. Notice the Calls on row 23 equals 135,110.
Here is the incorrect results, that include table FI. Notice the Calls on row 23 are multiplied to 1,561,038.

Related

How to nest multiple case when expressions and add a condition

I am trying to divide customers (contact_key) column who shopped in 2021 (A.TXN_MTH) into new and 'returning' with returning meaning that they had not shopped in the last 12 months (YYYYMM in X.Fiscal_mth_idnt column).
I am using CASE WHEN A.TXN_MTH = MIN(X.FISCAL_MTH_IDNT) THEN 'NEW' which is correct. The next case when should be when the max month before X.TXN_MTH is 12 or more months previous. I have added the 12 months part in the Where statement. Should I be nesting 3 CASE WHEN'S instead of WHERE?
SELECT
T.CONTACT_KEY
, A.TXN_MTH
, CASE WHEN A.TXN_MTH = MIN(X.FISCAL_MTH_IDNT) THEN 'NEW'
WHEN (MAX(CASE WHEN X.FISCAL_MTH_IDNT < A.TXN_MTH THEN X.FISCAL_MTH_IDNT ELSE NULL END)) THEN 'RETURNING'
END AS CUST_TYPE
FROM B_TRANSACTION T
INNER JOIN B_TIME X
ON T.TRANSACTION_DT_KEY = X.DATE_KEY
INNER JOIN A
ON A.CONTACT_KEY = T.CONTACT_KEY AND A.BU_KEY = T.BU_KEY
WHERE (MAX(CASE WHEN X.FISCAL_MTH_IDNT < A.TXN_MTH THEN X.FISCAL_MTH_IDNT ELSE NULL END)) < A.TXN_MTH - (date_format(add_months(concat_ws('-',substr(yearmonth,1,4),substr(yearmonth,5,2),'01'),-12),'yyyyMM')
GROUP BY
T.CONTACT_KEY
, TXN_MTH;

You have not described your tables, so assuming fiscal_mth_idnt is a DATE column then you can use the LAG analytic function to find the previous row's value:
SELECT contact_key,
txn_mth,
CASE
WHEN prev_fiscal_mth_idnt IS NULL
THEN 'NEW'
WHEN ADD_MONTHS(prev_fiscal_mth_idnt, 12) < fiscal_mth_idnt
THEN 'RETURNING'
ELSE 'CURRENT'
END AS cust_type
FROM (
SELECT T.CONTACT_KEY,
A.TXN_MTH,
yearmonth,
X.FISCAL_MTH_IDNT,
LAG(X.FISCAL_MTH_IDNT) OVER (
PARTITION BY T.CONTACT_KEY
ORDER BY X.FISCAL_MTH_IDNT
) AS prev_fiscal_mth_idnt
FROM B_TRANSACTION T
INNER JOIN B_TIME X
ON T.TRANSACTION_DT_KEY = X.DATE_KEY
INNER JOIN A
ON A.CONTACT_KEY = T.CONTACT_KEY AND A.BU_KEY = T.BU_KEY
)
WHERE yearmonth LIKE '2021%';

CASE WHEN condition with MAX() function

There are a lot questions on CASE WHEN topic, but the closest my question is related to this How to use CASE WHEN condition with MAX() function query which has not been resolved.
Here is some of my sample data:
date
debet
2022-07-15
57190.33
2022-07-14
815616516.00
2022-07-15
40866.67
2022-07-14
1221510.00
So, I want to all records for the last two dates and three additional columns: sum(sales) for the previous day, sum for the current day and the difference between them:
SELECT
[debet],
[date] ,
SUM( CASE WHEN [date] = MAX(date) THEN [debet] ELSE 0 END ) AS sum_act,
SUM( CASE WHEN [date] = MAX(date) - 1 THEN [debet] ELSE 0 END ) AS sum_prev ,
(
SUM( CASE WHEN [date] = MAX(date) THEN [debet] ELSE 0 END )
-
SUM( CASE WHEN [date] = MAX(date) - 1 THEN [debet] ELSE 0 END )
) AS diff
FROM
Table
WHERE
[date] = ( SELECT MAX(date) FROM Table WHERE date < ( SELECT MAX(date) FROM Table) )
OR
[date] = ( SELECT MAX(date) FROM Table WHERE date = ( SELECT MAX(date) FROM Table ) )
GROUP BY
[date],
[debet]
Further, of course, it informs that I can't use the aggregate function inside CASE WHEN. Now I use this combination: sum(CASE WHEN [date] = dateadd(dd,-3,cast(getdate() as date)) THEN [debet] ELSE 0 END). But here every time I need to make an adjustment for weekends and holidays. The question is, is there any other way than using 'getdate' in 'case when' Statement to get max date?
Expected result:
date
sum_act
sum_prev
diff
2022-07-15
97190.33
0.00
97190.33
2022-07-14
0.00
508769.96
-508769.96

You can use dense_rank() to filter the last 2 dates in your table. After that you can use either conditional case expression with sum() to calculate the required value
select [date],
sum_act = sum(case when rn = 1 then [debet] else 0 end),
sum_prev = sum(case when rn = 2 then [debet] else 0 end),
diff = sum(case when rn = 1 then [debet] else 0 end)
- sum(case when rn = 2 then [debet] else 0 end)
from
(
select *, rn = dense_rank() over (order by [date] desc)
from tbl
) t
where rn <= 2
group by [date]
db<>fiddle demo

Two steps:
Get the sums for the last three dates
Show the results for the last two dates.
Well, we could also get all daily sums in step 1, but we just need the last three in order to calculate the sums for the last two days, so why aggregate more data than necessary?
Here is the query. You may have to put the date column name in brackets in SQL Server, as date is a keyword in SQL.
select top(2)
date,
sum_debit_current,
sum_debit_previous,
sum_debit_current - sum_debit_previous as diff
(
select
date,
sum(debet) as sum_debit_current,
lag(sum(debet)) over (order by date) as sum_debit_previous
from table
where date in (select distinct top(3) date from table order by date desc)
group by date
)
order by date desc;
(SQL Server uses TOP(n) instead of standard SQL FETCH FIRST 3 ROWS and while SELECT DISTINCT TOP(3) date looks like "get the top 3 rows, then apply distinct on their date", it is really "apply distinct on the dates, then get the top 3" like in standard SQL.)

SQL - Group data with same ID and Date that has been to every Machine but has a different Name

I am trying to create a query that will group data by CT ID and Date that have all 3 MachineID's (1, 10, and 20) and at least one different Sawing Pattern Name.
This Image shows a highlighted example of the data I'm trying to get back and the code i'm currently using
I'm trying to only show data similar to the highlighted rows in the image (CT ID 501573833) and exclude the data in the rows around it where the Sawing Pattern Name is the same at all 3 MachineID's.

Your description suggests group by and having. The conditions you describe can all go in the having clause:
select ct_id, date
from t
group by ct_id, date
having sum(case when machineid = 1 then 1 else 0 end) > 0 and
sum(case when machineid = 10 then 1 else 0 end) > 0 and
sum(case when machineid = 20 then 1 else 0 end) > 0 and
min(sawing_pattern_name) <> max(sawing_pattern_name)

Seems to me that an EXISTS could be useful here.
SELECT
[CT ID],
[MachineID],
[Sawing Pattern name],
[Time],
CAST([Time] AS DATE) AS [Date]
FROM [DataCollector].[dbo].[Maxicut] t
WHERE EXISTS
(
SELECT 1
FROM [DataCollector].[dbo].[Maxicut] d
WHERE d.[CT ID] = t.[CT ID]
AND CAST(d.[Time] AS DATE) = CAST(t.[Time] AS DATE)
AND d.[MachineID] != t.[MachineID]
AND REPLACE(d.[Sawing Pattern name],',','') != REPLACE(t.[Sawing Pattern name],',','')
);

SQL - Inserting a condition in a GROUP BY

My issue is that some of the records in the result set are excluded because they are missing a Min_Date or Max_Date or Both. I need these records to be included so that I can show the runner ran in the race even if he did not reach a First, Last or any checkpoint. Any direction is appreciated.
SELECT A.Date, A.RunnerName, A.Duplicates, A.TotalWaypointsReached,
B.FirstWaypoint, C.LastWaypoint, C.rDateTime as MostRecent
FROM (
SELECT RunnerName,
CONVERT(NVARCHAR(25), rDatetime, 101) AS Date,
Min(case when FirstWaypoint is null OR FirstWaypoint = '' then null else rDateTime end) MIN_DATE,
Max(case when LastWaypoint is null OR LastWaypoint = '' then null else rDateTime end) MAX_DATE,
--IF(I'm missing a Max_Date, Min_Date, or both after all records in a group. Add Max(rDateTime) and Min(rDateTime))
Count(*) AS Duplicates,
SUM(TotalWaypoints) as TotalWaypointsReached
FROM Race A
GROUP BY RunnerName, CONVERT(NVARCHAR(25), rDateTime, 101)
HAVING Count(*) > 1 ) A
LEFT JOIN Race B
on A.RunnerName = B.RunnerName
and A.MIN_DATE = B.rDateTime
LEFT JOIN Race C
on A.RunnerName = C.RunnerName
and A.MAX_DATE = C.rDatetime
I'm using the select statement via SQL command in Visual Studio 2008.

multi-select sql query with date range

I have this query where I get totals of different stats from an employee roster table.
SELECT A.rempid AS EmpId,
E.flname,
A.rdo_total,
B.grave_total,
C.sundays,
D.holidays
FROM (SELECT rempid,
Count(rshiftid)AS RDO_Total
FROM rtmp1
WHERE rshiftid = 2
GROUP BY rempid
HAVING Count(rshiftid) > 0) A,
(SELECT rempid,
Count(rshiftid)AS Grave_Total
FROM rtmp1
WHERE rshiftid = 6
GROUP BY rempid
HAVING Count(rshiftid) > 0)B,
(SELECT rempid,
Count(rshiftid) AS Sundays
FROM rtmp1
WHERE Datepart(dw, rdate) = 1
AND rshiftid > 2
GROUP BY rempid
HAVING Count(rshiftid) > 0)C,
(SELECT rempid,
Count(rshiftid) AS Holidays
FROM rtmp1
WHERE rdate IN (SELECT pubhdt
FROM pubhol)
AND rshiftid > 2
GROUP BY rempid
HAVING Count(rshiftid) > 0)D,
(SELECT empid,
[fname] + ' ' + [sname] AS flName
FROM remp1)E
WHERE A.rempid = B.rempid
AND A.rempid = E.empid
AND A.rempid = C.rempid
AND A.rempid = D.rempid
ORDER BY A.rempid
I would like to add a date range into it, so that I can query the database within 2 dates. The rTmp1 table has a column called rDate. I was wondering what the best way to do this. I could add it to a stored procedure and add variable to each select query. Or is there a better way to run the query within a date range.

i think just add an additional where clause item similar to:
AND ( rDate > somedate AND rDate < someotherdate )

Adding the date range to each query is the most direct solution.
Making it a stored procedure is something that can always be done with a query, but has nothing to do with this specific case.
If the number of records resulting from narrowing down your table to the specified date range is substantially less than the entire table, it might be an option to insert these records into a temporary table or a table variable and run your existing query on that table/resultset.
Though I do not have any data to test, you might consider the following query as it is more easy to read and might perform better. But you have to check the results for yourself and maybe do some adjustments.
DECLARE #startDate date = '12/01/2012'
DECLARE #endDate date = DATEADD(MONTH, 1, #startDate)
SELECT
[e].[empid],
[e].[fname] + ' ' + [e].[sname] AS [flName],
SUM(CASE WHEN [t].[rshiftid] = 2 THEN 1 ELSE 0 END) AS [RDO_Total],
SUM(CASE WHEN [t].[rshiftid] = 6 THEN 1 ELSE 0 END) AS [Grave_Total],
SUM(CASE WHEN [t].[rshiftid] > 2 AND DATEPART(dw, [t].[rdate]) = 1 THEN 1 ELSE 0 END) AS [Sundays],
SUM(CASE WHEN [t].[rshiftid] > 2 AND [h].[pubhdt] IS NOT NULL THEN 1 ELSE 0 END) AS [Holidays]
FROM [remp1] [e]
INNER JOIN [rtmp1] [t] ON [e].[empid] = [t].[rempid]
LEFT JOIN [pubhol] [h] ON [t].[rdate] = [h].[pubhdt]
WHERE [t].[rdate] BETWEEN #startDate AND #endDate
GROUP BY
[e].[empid],
[e].[fname],
[e].[sname]
ORDER BY [empid] ASC

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Inline Table Join Multiplying Results - sql

Related

How to nest multiple case when expressions and add a condition

CASE WHEN condition with MAX() function

SQL - Group data with same ID and Date that has been to every Machine but has a different Name

SQL - Inserting a condition in a GROUP BY

multi-select sql query with date range

Categories

Resources