Adding back some original info for historical purposes as I thought simplifying would help but it didn't. We have this stored procedure, in this part it is selecting records from table A (calldetail_reporting_agents) and doing a left join on table B (Intx_Participant). Apparently there are duplicate rows in table B being pulled that we DON'T want. Is there any easy way to change this up to only pick the first match on table B? Or will I need to rewrite the whole thing?
SELECT 'Agent Calls' AS CallType,
CallDate,
CallTime,
RemoteNumber,
DialedNumber,
RemoteName,
LocalUserId,
CallDurationSeconds,
Answered,
AnswerSpeed,
InvalidCall,
Intx_Participant.Duration
FROM calldetail_reporting_agents
LEFT JOIN Intx_Participant ON calldetail_reporting_agents.CallID = Intx_Participant.CallIDKey
WHERE DialedNumber IN ( SELECT DialedNumber
FROM #DialedNumbers )
AND ConnectedDate BETWEEN #LocStartDate AND #LocEndDate
AND (#LocQueue IS NULL OR AssignedWorkGroup = #LocQueue)
Simpler version: how to change below to select only first matching row from table B:
SELECT columnA, columnB FROM TableA LEFT JOIN TableB ON someColumn
I changed to this per the first answer and all data seems to look exactly as expected now. Thank you to everyone for the quick and attentive help.
SELECT 'Agent Calls' AS CallType,
CallDate,
CallTime,
RemoteNumber,
DialedNumber,
RemoteName,
LocalUserId,
CallDurationSeconds,
Answered,
AnswerSpeed,
InvalidCall,
Intx_Participant.Duration
FROM calldetail_reporting_agents
OUTER APPLY (SELECT TOP 1
*
FROM Intx_Participant ip
WHERE calldetail_reporting_agents.CallID = ip.CallIDKey
AND calldetail_reporting_agents.RemoteNumber = ip.ConnValue
AND ip.HowEnded = '9'
AND ip.Recorded = '0'
AND ip.Duration > 0
AND ip.Role = '1') Intx_Participant
WHERE DialedNumber IN ( SELECT DialedNumber
FROM #DialedNumbers )
AND ConnectedDate BETWEEN #LocStartDate AND #LocEndDate
AND (#LocQueue IS NULL OR AssignedWorkGroup = #LocQueue)
You can try to OUTER APPLY a subquery getting only one matching row.
...
FROM calldetail_reporting_agents
OUTER APPLY (SELECT TOP 1
*
FROM intx_Participant ip
WHERE ip.callidkey = calldetail_reporting_agents.callid) intx_participant
WHERE ...
You should add an ORDER BY in the subquery. Otherwise it isn't deterministic which row is taken as the first. Or maybe that's not an issue.
Related
My weakest area of SQL are self JOINS, currently struggling with an issue.
I need to find the latest entry in a table, I'm using a WHERE DATEFIELD IN (SELECT MAX(DATEFIELD) FROM TABLE) to do this. I then need to establish if 3 columns from that already exist in the same TABLE.
My latest attempt looks like this -
SELECT * FROM PART_TABLE
WHERE NOT EXISTS
(
SELECT
t1.DATEFIELD
t1.CODE1
t1.CODE2
t1.CODE3
FROM PART_TABLE t1
INNER JOIN PART_TABLE t2 ON t1.UNIQUE = t2.UNQIUE
)
WHERE t1.DATEFIELD IN
(
SELECT MAX(DATEFIELD)
FROM PARTTABLE
)
)
I think part of the issue is that I can't exclude the unique row from t1 when checking in t2 using this method.
Using MSSQL 2014.
The following query will return the latest record from your table and a bit flag whether a duplicate tuple {Code1, Code2, Code3} exists in it under a different identifier:
select top (1) p.*,
case when exists (
select 0 from dbo.Part_Table t where t.Unique != p.Unique
and t.Code1 = p.Code1 and t.Code2 = p.Code2 and t.Code3 = p.Code3
) then 1
else 0 end as [IsDuplicateExists]
from dbo.Part_Table p
order by p.DateField desc;
You can use this example as a template to address your specific needs, which unfortunately aren't immediately apparent from your explanation.
I have a question on sql desgin.
Context:
I have a table called t_master and 13 other tables (lets call them a,b,c... for simplicity) where it needs to compared.
Logic:
t_master will be compared to table 'a' where t_master.gen_val =
a.value.
If record exist in t_master, retrieve t_master record, else retrieve 'a' record.
I do not need to retrieve the records if it exists in both tables (t_master and a) - XOR condition
Repeat this comparison with the remaining 12 tables.
I have some idea on doing this, using WITH to subquery the non-master tables (a,b,c...) first with their respective WHERE clause.
Then use XOR statement to retrieve the records.
Something like
WITH a AS (SELECT ...),
b AS (SELECT ...)
SELECT field1,field2...
FROM t_master FULL OUTER JOIN a FULL OUTER JOIN b FULL OUTER JOIN c...
ON t_master.gen_value = a.value
WHERE ((field1 = x OR field2 = y ) AND NOT (field1 = x AND field2 = y))
AND ....
.
.
.
.
Seeing that I have 13 tables that I need to full outer join, is there a better way/design to handle this?
Otherwise I would have at least 2*13 lines of WHERE clause which I'm not sure if that will have impact on the performance as t_master is sort of a log table.
**Assume I cant change any schema.
Currently I'm not sure if this SQL will working correctly yet, so I'm hoping someone can guide me in the right direction regarding this.
update from used_by_already's suggestion:
This is what I'm trying to do (comparison between 2 tables first, before I add more, but I am unable to get values from ATP_R.TBL_HI_HDR HI_HDR as it is in the NOT EXISTS subquery.
How do i overcome this?
SELECT LOG_REPO.UNIQ_ID,
LOG_REPO.REQUEST_PAYLOAD,
LOG_REPO.GEN_VAL,
LOG_REPO.CREATED_BY,
TO_CHAR(LOG_REPO.CREATED_DT,'DD/MM/YYYY') AS CREATED_DT,
HI_HDR.HI_NO R_VALUE,
HI_HDR.CREATED_BY R_CREATED_BY,
TO_CHAR(HI_HDR.CREATED_DT,'DD/MM/YYYY') AS R_CREATED_DT
FROM ATP_COMMON.VW_CMN_LOG_GEN_REPO LOG_REPO JOIN ATP_R.TBL_HI_HDR HI_HDR ON LOG_REPO.GEN_VAL = HI_HDR.HI_NO
WHERE NOT EXISTS
(SELECT NULL
FROM ATP_R.TBL_HI_HDR HI_HDR
WHERE LOG_REPO.GEN_VAL = HI_HDR.HI_NO
)
UNION ALL
SELECT LOG_REPO.UNIQ_ID,
LOG_REPO.REQUEST_PAYLOAD,
LOG_REPO.GEN_VAL,
LOG_REPO.CREATED_BY,
TO_CHAR(LOG_REPO.CREATED_DT,'DD/MM/YYYY') AS CREATED_DT,
HI_HDR.HI_NO R_VALUE,
HI_HDR.CREATED_BY R_CREATED_BY,
TO_CHAR(HI_HDR.CREATED_DT,'DD/MM/YYYY') AS R_CREATED_DT
FROM ATP_R.TBL_HI_HDR HI_HDR JOIN ATP_COMMON.VW_CMN_LOG_GEN_REPO LOG_REPO ON HI_HDR.HI_NO = LOG_REPO.GEN_VAL
WHERE NOT EXISTS
(SELECT NULL
FROM ATP_COMMON.VW_CMN_LOG_GEN_REPO LOG_REPO
WHERE HI_HDR.HI_NO = LOG_REPO.GEN_VAL
)
Full outer joins used to exclude all matching rows can be an expensive query. You don't supply much detail, but perhaps using NOT EXISTS would be simpler and maybe it will produce a better explain plan. Something along these lines.
select
cola,colb,colc
from t_master m
where not exists (
select null from a where m.keycol = a.fk_to_m
)
and not exists (
select null from b where m.keycol = b.fk_to_m
)
and not exists (
select null from c where m.keycol = c.fk_to_m
)
union all
select
cola,colb,colc from a
where not exists (
select null from t_master m where a.fk_to_m = m.keycol
)
union all
select
cola,colb,colc from b
where not exists (
select null from t_master m where b.fk_to_m = m.keycol
)
union all
select
cola,colb,colc from c
where not exists (
select null from t_master m where c.fk_to_m = m.keycol
)
You could union the 13 a,b,c ... tables to simplify the coding, but that may not perform so well.
I have a simple SQL table containing some values, for example:
id | value (table 'values')
----------
0 | 4
1 | 7
2 | 9
I want to iterate over these values, and use them in a query like so:
SELECT value[0], x1
FROM (some subquery where value[0] is used)
UNION
SELECT value[1], x2
FROM (some subquery where value[1] is used)
...
etc
In order to get a result set like this:
4 | x1
7 | x2
9 | x3
It has to be in SQL as it will actually represent a database view. Of course the real query is a lot more complicated, but I tried to simplify the question while keeping the essence as much as possible.
I think I have to select from values and join the subquery, but as the value should be used in the subquery I'm lost on how to accomplish this.
Edit: I oversimplified my question; in reality I want to have 2 rows from the subquery and not only one.
Edit 2: As suggested I'm posting the real query. I simplified it a bit to make it clearer, but it's a working query and the problem is there. Note that I have hardcoded the value '2' in this query two times. I want to replace that with values from a different table, in the example table above I would want a result set of the combined results of this query with 4, 7 and 9 as values instead of the currently hardcoded 2.
SELECT x.fantasycoach_id, SUM(round_points)
FROM (
SELECT DISTINCT fc.id AS fantasycoach_id,
ffv.formation_id AS formation_id,
fpc.round_sequence AS round_sequence,
round_points,
fpc.fantasyplayer_id
FROM fantasyworld_FantasyCoach AS fc
LEFT JOIN fantasyworld_fantasyformation AS ff ON ff.id = (
SELECT MAX(fantasyworld_fantasyformationvalidity.formation_id)
FROM fantasyworld_fantasyformationvalidity
LEFT JOIN realworld_round AS _rr ON _rr.id = round_id
LEFT JOIN fantasyworld_fantasyformation AS _ff ON _ff.id = formation_id
WHERE is_valid = TRUE
AND _ff.coach_id = fc.id
AND _rr.sequence <= 2 /* HARDCODED USE OF VALUE */
)
LEFT JOIN fantasyworld_FantasyFormationPlayer AS ffp
ON ffp.formation_id = ff.id
LEFT JOIN dbcache_fantasyplayercache AS fpc
ON ffp.player_id = fpc.fantasyplayer_id
AND fpc.round_sequence = 2 /* HARDCODED USE OF VALUE */
LEFT JOIN fantasyworld_fantasyformationvalidity AS ffv
ON ffv.formation_id = ff.id
) x
GROUP BY fantasycoach_id
Edit 3: I'm using PostgreSQL.
SQL works with tables as a whole, which basically involves set operations. There is no explicit iteration, and generally no need for any. In particular, the most straightforward implementation of what you described would be this:
SELECT value, (some subquery where value is used) AS x
FROM values
Do note, however, that a correlated subquery such as that is very hard on query performance. Depending on the details of what you're trying to do, it may well be possible to structure it around a simple join, an uncorrelated subquery, or a similar, better-performing alternative.
Update:
In view of the update to the question indicating that the subquery is expected to yield multiple rows for each value in table values, contrary to the example results, it seems a better approach would be to just rewrite the subquery as the main query. If it does not already do so (and maybe even if it does) then it would join table values as another base table.
Update 2:
Given the real query now presented, this is how the values from table values could be incorporated into it:
SELECT x.fantasycoach_id, SUM(round_points) FROM
(
SELECT DISTINCT
fc.id AS fantasycoach_id,
ffv.formation_id AS formation_id,
fpc.round_sequence AS round_sequence,
round_points,
fpc.fantasyplayer_id
FROM fantasyworld_FantasyCoach AS fc
-- one row for each combination of coach and value:
CROSS JOIN values
LEFT JOIN fantasyworld_fantasyformation AS ff
ON ff.id = (
SELECT MAX(fantasyworld_fantasyformationvalidity.formation_id)
FROM fantasyworld_fantasyformationvalidity
LEFT JOIN realworld_round AS _rr
ON _rr.id = round_id
LEFT JOIN fantasyworld_fantasyformation AS _ff
ON _ff.id = formation_id
WHERE is_valid = TRUE
AND _ff.coach_id = fc.id
-- use the value obtained from values:
AND _rr.sequence <= values.value
)
LEFT JOIN fantasyworld_FantasyFormationPlayer AS ffp
ON ffp.formation_id = ff.id
LEFT JOIN dbcache_fantasyplayercache AS fpc
ON ffp.player_id = fpc.fantasyplayer_id
-- use the value obtained from values again:
AND fpc.round_sequence = values.value
LEFT JOIN fantasyworld_fantasyformationvalidity AS ffv
ON ffv.formation_id = ff.id
) x
GROUP BY fantasycoach_id
Note in particular the CROSS JOIN which forms the cross product of two tables; this is the same thing as an INNER JOIN without any join predicate, and it can be written that way if desired.
The overall query could be at least a bit simplified, but I do not do so because it is a working example rather than an actual production query, so it is unclear what other changes would translate to the actual application.
In the example I create two tables. See how outer table have an alias you use in the inner select?
SQL Fiddle Demo
SELECT T.[value], (SELECT [property] FROM Table2 P WHERE P.[value] = T.[value])
FROM Table1 T
This is a better way for performance
SELECT T.[value], P.[property]
FROM Table1 T
INNER JOIN Table2 p
on P.[value] = T.[value];
Table 2 can be a QUERY instead of a real table
Third Option
Using a cte to calculate your values and then join back to the main table. This way you have the subquery logic separated from your final query.
WITH cte AS (
SELECT
T.[value],
T.[value] * T.[value] as property
FROM Table1 T
)
SELECT T.[value], C.[property]
FROM Table1 T
INNER JOIN cte C
on T.[value] = C.[value];
It might be helpful to extract the computation to a function that is called in the SELECT clause and is executed for each row of the result set
Here's the documentation for CREATE FUNCTION for SQL Server. It's probably similar to whatever database system you're using, and if not you can easily Google for it.
Here's an example of creating a function and using it in a query:
CREATE FUNCTION DoComputation(#parameter1 int)
RETURNS int
AS
BEGIN
-- Do some calculations here and return the function result.
-- This example returns the value of #parameter1 squared.
-- You can add additional parameters to the function definition if needed
DECLARE #Result int
SET #Result = #parameter1 * #parameter1
RETURN #Result
END
Here is an example of using the example function above in a query.
SELECT v.value, DoComputation(v.value) as ComputedValue
FROM [Values] v
ORDER BY value
I am currently joining two tables based on Claim_Number and Customer_Number.
SELECT
A.*,
B.*,
FROM Company.dbo.Company_Master AS A
LEFT JOIN Company.dbp.Compound_Info AS B ON A.Claim_Number = B.Claim_Number AND A.Customer_Number = B.Customer_Number
WHERE A.Filled_YearMonth = '201312' AND A.Compound_Ind = 'Y'
This returns exactly the data I'm looking for. The problem is that I now need to join to another table to get information based on a Product_ID. This would be easy if there was only one Product_ID in the Compound_Info table for each record. However, there are 10. So basically I need to SELECT 10 additional columns for Product_Name based on each of those Product_ID's that are being selected already. How can do that? This is what I was thinking in my head, but is not working right.
SELECT
A.*,
B.*,
PD_Info_1.Product_Name,
PD_Info_2.Product_Name,
....etc {Up to 10 Product Names}
FROM Company.dbo.Company_Master AS A
LEFT JOIN Company.dbo.Compound_Info AS B ON A.Claim_Number = B.Claim_Number AND A.Customer_Number = B.Customer_Number
LEFT JOIN Company.dbo.Product_Info AS PD_Info_1 ON B.Product_ID_1 = PD_Info_1.Product_ID
LEFT JOIN Company.dbo.Product_Info AS PD_Info_2 ON B.Product_ID_2 = PD_Info_2.Product_ID
.... {Up to 10 LEFT JOIN's}
WHERE A.Filled_YearMonth = '201312' AND A.Compound_Ind = 'Y'
This query not only doesn't return the correct results, it also takes forever to run. My actual SQL is a lot longer and I've changed table names, etc but I hope that you can get the idea. If it matters, I will be creating a view based on this query.
Please advise on how to select multiple columns from the same table correctly and efficiently. Thanks!
I found put my extra stuff into CTE and add ROW_NUMBER to insure that I get only 1 row that I care about. it would look something like this. I only did for first 2 product info.
WITH PD_Info
AS ( SELECT Product_ID
,Product_Name
,Effective_Date
,ROW_NUMBER() OVER ( PARTITION BY Product_ID, Product_Name ORDER BY Effective_Date DESC ) AS RowNum
FROM Company.dbo.Product_Info)
SELECT A.*
,B.*
,PD_Info_1.Product_Name
,PD_Info_2.Product_Name
FROM Company.dbo.Company_Master AS A
LEFT JOIN Company.dbo.Compound_Info AS B
ON A.Claim_Number = B.Claim_Number
AND A.Customer_Number = B.Customer_Number
LEFT JOIN PD_Info AS PD_Info_1
ON B.Product_ID_1 = PD_Info_1.Product_ID
AND B.Fill_Date >= PD_Info_1.Effective_Date
AND PD_Info_2.RowNum = 1
LEFT JOIN PD_Info AS PD_Info_2
ON B.Product_ID_2 = PD_Info_2.Product_ID
AND B.Fill_Date >= PD_Info_2.Effective_Date
AND PD_Info_2.RowNum = 1
I created a table and i am in the process of inserting rows from another table into it. However, some of these rows require joins from other tables. To my knowledge, this means using a subquery select statement in the statement. the problem is subqueries only return one result, where i may have many. I am wanting to return a -1 where no records exists. Here is an example i am using but it is not working:
INSERT INTO [BDW_ReportPrototype].[dbo].[CustomerCreditFact]
( [MortgageDimID]
,[LeaseDimID]
,[OREODimID]
,[OfficerTypeDimID] )
SELECT
--[MortgageDimID]
-2
--LeaseDimID
,-2
--OREODimID
,-2
,CASE WHEN OfficerTypeDimID IS NULL THEN -1 ELSE OfficerTypeDimID END
FROM Staging_FDB_LN_CPDM_Daily LCD
LEFT OUTER JOIN ERMA..OfficerTypeDim OTD on OTD.OfficerNum = LCD.OFFICER
FROM dbo.Staging_FDB_LN_CPDM_Daily
Try this sql statement
SELECT CASE WHEN OfficerTypeDimID IS NULL THEN -1 ELSE OfficerTypeDimID END
FROM Staging_FDB_LN_CPDM_Daily LCD
LEFT OUTER JOIN ERMA..OfficerTypeDim OTD on OTD.OfficerNum = LCD.OFFICER
I would rework your query like the following.
First of all, use a LEFT OUTER JOIN in your query instead of the subqueries. This type of join says a row might exist in the "other" table but it might not but I want a row back regardless.
Now that you know you'll have all your rows, you'll want to see if there is a value there or not. Use the shorthand and easier to maintain check via the coalesce function. It basically is a list of values (column names, variables or hard coded values) and the optimizer will pick the first non-null value from the list and use it. Here we supply -1 for your query
INSERT INTO
[BDW_ReportPrototype].[dbo].[CustomerCreditFact]
(
[OfficerTypeDimID]
)
SELECT
-- coalesce returns the first non-null value
COALESCE(OTD.OfficerTypeDimID, -1) AS OfficerTypeDimID
FROM
dbo.Staging_FDB_LN_CPDM_Daily LCD
LEFT OUTER JOIN
ERMA..OfficerTypeDim OTD
ON OTD.OfficerNum = LCD.OFFICER
maybe something along these lines...
INSERT INTO [BDW_ReportPrototype].[dbo].[CustomerCreditFact]
([OfficerTypeDimID])
Select OfficerTypeDimID
from ERMA..OfficerTypeDim OTD
inner JOIN Staging_FDB_LN_CPDM_Daily LCD
on OTD.OfficerNum = LCD.OFFICER
UNION ALL
SELECT -1
FROM dbo.Staging_FDB_LN_CPDM_Daily LCD
WHERE NOT EXISTS
(
Select OfficerTypeDimID from ERMA..OfficerTypeDim
OTD
WHERE
OTD.OfficerNum = LCD.OFFICER
)