Joining on one of Two Tables Based on Parameter - sql

Not sure if this can be done, but here is what I am trying to do.
I have two tables:
Table 1 is called Task and it contains all of the possible Task Names
Table 2 is called Task_subset and it contains only a subset of the Task Names included in Table 1
I have a variable called #TaskControl, that is passed in as a parameter, it either is equal to Table1 or Table2
Based on the value of the #TaskControl variable I want to join one of my Task Tables
For example:
If #TaskControl = 'Table1':
Select * From Orders O Join Task T on T.id = O.id
If #TaskControl = 'Table2):
Select * From Orders O Join Task_subset T on T.id = O.id
How would I do this, Sql Server 08

Don't overcomplicate it. Put it into a stored proc like so:
CREATE PROCEDURE dbo.MyProcedure(#TaskControl varchar(20))
AS
If #TaskControl = 'Table1'
Select * From Orders O Join Task T on T.id = O.id
ELSE If #TaskControl = 'Table2'
Select * From Orders O Join Task_subset T on T.id = O.id
ELSE SELECT 'Invalid Parameter'
Or just straight TSQL with no proc:
If #TaskControl = 'Table1'
Select * From Orders O Join Task T on T.id = O.id
ELSE If #TaskControl = 'Table2'
Select * From Orders O Join Task_subset T on T.id = O.id

Doing it exactly as you do it right now is the best way. Having one single statement that attempts to somehow dynamically join one of two statements is the last thing you want. T-SQL is a language for data access, not for DRY code-reuse programming. If you attempt to have a single statement then the optimizer has to come up with a plan that always work, no matter the value of #TaskControl, and so the plan will always have to join both tables.
A more lengthy discussion on this topic is Dynamic Search Conditions in T-SQL (your dynamic join falls into the same topic as dynamic search).

If they are UNION compatible you could give this a shot. From a quick test this end it only appears to access the relevant table.
I do agree more with JNK's and Remus's answers however. This does have a recompilation cost for every invocation and not much benefit.
;WITH T AS
(
SELECT 'Table1' AS TaskControl, id
FROM Task
UNION ALL
SELECT 'Table2' AS TaskControl, id
FROM Task_subset
)
SELECT *
FROM T
JOIN Orders O on T.id = O.id
WHERE TaskControl = #TaskControl
OPTION (RECOMPILE)

I don't know how good performance would be, and this would not scale well as you add on additional optional tables, but this should work in the situation that you present.
SELECT
O.some_column,
COALESCE(T.some_task_column, TS.some_task_subset_column)
FROM
Orders O
LEFT OUTER JOIN Tasks T ON
#task_control = 'Tasks' AND
T.id = O.id
LEFT OUTER JOIN Task_Subsets TS ON
#task_control = 'Task Subsets' AND
TS.id = O.id

Try the following. It should avoid the stored procedure plan getting bound based on the value of the parameter passed during the first execution of the stored procedure (See SQL Server Parameter Sniffing for details):
create proc dbo.foo
#TaskControl varchar(32)
as
declare #selection varchar(32)
set #selection = #TaskControl
select *
from dbo.Orders t
join dbo.Task t1 on t1.id = t.id
where #selection = 'Table1'
UNION ALL
select *
from dbo.Orders t
join dbo.Task_subset t1 on t1.id = t.id
where #selection = 'Table2'
return 0
go
The stored procedure shouldn't get recompiled for each invocation, either, as #Martin suggested might happen, but the parameter value 1st passed in should not influence the execution plan the gets bound. But if performance is an issue, run a sql trace with the profiler and see if the cached execution plan is reused or if a recompile is triggered.
One thing, though: you will need to ensure, though, that each individual select in the UNION returns the exact same columns. Each select in a UNION must have the same number of columns and each column must have a common type (or default conversion to the common type). The 1st select defines the number, types and names of the columns in the result set.

Related

How to convert inline SQL queries to JOINS in SQL SERVER to reduce load time

I need help in optimizing this SQL query.
In the main SELECT statement there are three columns which is dependent on the outer query result. This is why my query is taking a long time to return data. I have tried making left joins but this is not working properly.
Can anyone help me to resolve this issue?
SELECT
DISTINCT ou.OrganizationUserID AS StudentID,
ou.FirstName,
ou.LastName,
(
SELECT
STRING_AGG(
(ug.UG_Name),
','
)
FROM
Groups ug
INNER JOIN ApplicantUserGroup augm ON augm.AUGM_UserGroupID = ug.UG_ID
WHERE
augm.AUGM_OrganizationUserID = ou.OrganizationUserID
AND ug.UG_IsDeleted = 0
AND augm.AUGM_IsDeleted = 0
) AS UserGroups,
order1.OrderNumber AS OrderId -- UAT-2455
,
(
SELECT
STRING_AGG(
(CActe.CustomAttribute),
','
)
FROM
CustomAttributeCte CActe
WHERE
CActe.HierarchyNodeID = dpm.DPM_ID
AND CActe.OrganizationUserID = ps.OrganizationUserID
) AS CustomAttributes -- UAT-2455
,
(
SELECT
STRING_AGG(
(CActe.CustomAttributeID),
','
)
FROM
CustomAttributeCte CActe
WHERE
CActe.HierarchyNodeID = dpm.DPM_ID
AND CActe.OrganizationUserID = ps.OrganizationUserID
) AS CustomAttributeID
FROM
ApplicantData acd WITH (NOLOCK)
INNER JOIN ClientPackage ps WITH (NOLOCK) ON acd.ClientSubscriptionID = ps.ClientSubscriptionID
INNER JOIN [ClientOrder] order1 WITH (NOLOCK) ON order1.OrderID = ps.OrderID
AND order1.IsDeleted = 0
INNER JOIN OUser ou WITH (NOLOCK) ON ou.OrganizationUserID = ps.OrganizationUserID
It looks like this query can be simplified, and the dependent subqueries in your SELECT clause removed, Consider your second and third dependent subqueries. You can refactor them into one nondependent subquery with a LEFT JOIN. Using nondependent subqueries is more efficient because the query planner can run them just once, rather than once for each row.
You want two STRING_AGG() results from the same table. This subquery gives those two outputs for every possible combination of HierarchyNodeID and OrganizationUserID values. STRING_AGG() is an aggregate function like SUM() and so works nicely with GROUP BY.
SELECT HierarchyNodeID, OrganizationUserID,
STRING_AGG((CActe.CustomAttribute), ',') CustomAttributes -- UAT-2455,
STRING_AGG((CActe.CustomAttributeID), ',') CustomAttributeIDs -- UAT-2455
FROM CustomAttributeCte CActe
GROUP BY HierarchyNodeID, OrganizationUserID
You can run this subquery itself to convince yourself it works.
Now, we can LEFT JOIN that into your query. Like this. (For readability I took out the NOLOCKs and used JOIN: it means the same thing as INNER JOIN.)
SELECT DISTINCT
ou.OrganizationUserID AS StudentID,
ou.FirstName,
ou.LastName,
'tempvalue' AS UserGroups, -- shortened for testing
order1.OrderNumber AS OrderId, -- UAT-2455
uat2455.CustomAttributes, -- UAT-2455
uat2455.CustomAttributeIDs -- UAT-2455
FROM ApplicantData acd
JOIN ClientPackage ps
ON acd.ClientSubscriptionID = ps.ClientSubscriptionID
JOIN ClientOrder order1
ON order1.OrderID = ps.OrderID
AND order1.IsDeleted = 0
JOIN OUser ou
ON ou.OrganizationUserID = ps.OrganizationUserID
LEFT JOIN (
SELECT HierarchyNodeID, OrganizationUserID,
STRING_AGG((CActe.CustomAttribute), ',') CustomAttributes -- UAT-2455,
STRING_AGG((CActe.CustomAttributeID), ',') CustomAttributeIDs -- UAT-2455
FROM CustomAttributeCte CActe
GROUP BY HierarchyNodeID, OrganizationUserID
) uat2455
ON uat2455.HierarchyNodeID = dpm.DPM_ID
AND uat2455.OrganizationUserId = ps.OrganizationUserID
See how we collapsed your second and third dependent subqueries to just one, then used it as a virtual table with LEFT JOIN? We transformed the WHERE clauses from the dependent subqueries into an ON clause.
You can test this: run it with TOP(50) and eyeball the results.
When you're happy, the next step is to transform your first dependent subquery the same way.
Pro tip Don't use WITH (NOLOCK), ever, unless a database administration expert tells you to after looking at your specific query. If your query's purpose is a historical report and you don't care whether the most recent transactions in your database are represented exactly right, you can precede your query with this statement. It also allows the query to run while avoiding locks.
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
Pro tip Be obsessive about formatting your queries for readability. You, your colleagues, and yourself a year from now must be able to read and reason about queries like this.

create a table based on the results (output) of previous queries

how can I create a table based on the output of previous queries? I tried Create table as select distinct syntax but it doesn't work.
ps: the previous queries contain inner join SET dateformat dmy
select distinct c.peril, c.period, c.eventid, c.eventdate, c.loss_index_2016,
c.loss_index_20182019, a.eventdatetime, b.region, a.year,a.eventid
from RetroNCTJan2019v5 a
inner join retropltheader2019v5 b
on a.segmentid=b.segmentid
inner join Index2019 c
on b.region = c.peril and a.Year = c.period and
a.eventid=convert(numeric(30,0),convert(float(28),c.eventid)) and
month(eventdate) = month(eventdatetime) and day(eventdate)=day(eventdatetime)
In SQL Server you insert the results of a SELECT into a new table using INTO:
select distinct c.peril, c.period, c.eventid,c.eventdate, c.loss_index_2016, c.loss_index_20182019,
a.eventdatetime, b.region, a.year,a.eventid
into new_table
from RetroNCTJan2019v5 a join
retropltheader2019v5 b
on a.segmentid = b.segmentid join
Index2019 c
on b.region = c.peril and a.Year = c.period and
a.eventid = convert(numeric(30,0), convert(float(28),c.eventid)) and
month(eventdate) = month(eventdatetime) and
day(eventdate) = day(eventdatetime)
Most databases use create table as, which in my opinion, is a little clearer.
That said, you should learn some better coding habits:
Use table aliases that are meaningful. This usually means abbreviations of the table names. Don't use arbitrary letters.
Qualify all column references, especially when your query has more than one table reference.
Converting to a float and then a numeric is probably unnecessary. In fact, having to do a type conversion in a JOIN condition is suspicious -- and even more so when the two columns have the same name. Same name --> same type.

Why do multiple EXISTS break a query

I am attempting to include a new table with values that need to be checked and included in a stored procedure. Statement 1 is the existing table that needs to be checked against, while statement 2 is the new table to check against.
I currently have 2 EXISTS conditions that function independently and produce the results I am expecting. By this I mean if I comment out Statement 1, statement 2 works and vice versa. When I put them together the query doesn't complete, there is no error but it times out which is unexpected because each statement only takes a few seconds.
I understand there is likely a better way to do this but before I do, I would like to know why I cannot seem to do multiple exists statements like this? Are there not meant to be multiple EXISTS conditions in the WHERE clause?
SELECT *
FROM table1 S
WHERE
--Statement 1
EXISTS
(
SELECT 1
FROM table2 P WITH (NOLOCK)
INNER JOIN table3 SA ON SA.ID = P.ID
WHERE P.DATE = #Date AND P.OTHER_ID = S.ID
AND
(
SA.FILTER = ''
OR
(
SA.FILTER = 'bar'
AND
LOWER(S.OTHER) = 'foo'
)
)
)
OR
(
--Statement 2
EXISTS
(
SELECT 1
FROM table4 P WITH (NOLOCK)
INNER JOIN table5 SA ON SA.ID = P.ID
WHERE P.DATE = #Date
AND P.OTHER_ID = S.ID
AND LOWER(S.OTHER) = 'foo'
)
)
EDIT: I have included the query details. Table 1-5 represent different tables, there are no repeated tables.
Too long to comment.
Your query as written seems correct. The timeout will only be able to be troubleshot from the execution plan, but here are a few things that could be happening or that you could benefit from.
Parameter sniffing on #Date. Try hard-coding this value and see if you still get the same slowness
No covering index on P.OTHER_ID or P.DATE or P.ID or SA.ID which would cause a table scan for these predicates
Indexes for the above columns which aren't optimal (including too many columns, etc)
Your query being serial when it may benefit from parallelism.
Using the LOWER function on a database which doesn't have a case sensitive collation (most don't, though this function doesn't slow things down that much)
You have a bad query plan in cache. Try adding OPTION (RECOMPILE) at the bottom so you get a new query plan. This is also done when comparing the speed of two queries to ensure they aren't using cached plans, or one isn't when another is which would skew the results.
Since your query is timing out, try including the estimated execution plan and post it for us at past the plan
I found putting 2 EXISTS in the WHERE condition made the whole process take significantly longer. What I found fixed it was using UNION and keeping the EXISTS in separate queries. The final result looked like the following:
SELECT *
FROM table1 S
WHERE
--Statement 1
EXISTS
(
SELECT 1
FROM table2 P WITH (NOLOCK)
INNER JOIN table3 SA ON SA.ID = P.ID
WHERE P.DATE = #Date AND P.OTHER_ID = S.ID
AND
(
SA.FILTER = ''
OR
(
SA.FILTER = 'bar'
AND
LOWER(S.OTHER) = 'foo'
)
)
)
UNION
--Statement 2
SELECT *
FROM table1 S
WHERE
EXISTS
(
SELECT 1
FROM table4 P WITH (NOLOCK)
INNER JOIN table5 SA ON SA.ID = P.ID
WHERE P.DATE = #Date
AND P.OTHER_ID = S.ID
AND LOWER(S.OTHER) = 'foo'
)

SQL How to optimize insert to table from temporary table

I created procedure where dynamically collecting from various projects (Databases) some records into temporary table and from that temporary table I am inserting into table. With WHERE statement , but unfortunately when I checked with Execution plan I find out, that this query part take a lot of load. How can I optimize this INSERT part or WHERE statement ?
INSERT INTO dbo.PROJECTS_TESTS ( PROJECTID, ANOTHERTID, DOMAINID, is_test)
SELECT * FROM #temp_Test AS tC
WHERE NOT EXISTS (SELECT TOP 1 1
FROM dbo.PROJECTS_TESTS AS ps WITH (NOLOCK)
WHERE ps.PROJECTID = tC.projectId
AND ps.ANOTHERTID = tC.anotherLink
AND ps.DOMAINID = tC.DOMAINID
AND ps.is_test = tC.test_project
)
I think you'd be better served by doing a JOIN than EXISTS. Depending on the cardinality of your join condition (currently in your WHERE) you might need DISTINCT in there too.
INSERT INTO dbo.PROJECTS_TESTS ( PROJECTID, ANOTHERTID, DOMAINID, is_test)
SELECT <maybe distinct> tC.* FROM #temp_Test AS tC
LEFT OUTER JOIN FROM dbo.PROJECTS_TESTS AS ps on
ps.PROJECTID = tC.projectId
AND ps.ANOTHERTID = tC.anotherLink
AND ps.DOMAINID = tC.DOMAINID
AND ps.is_test = tC.test_project
where ps.PROJECT ID IS NULL
or something like that

SQL Stored Procedure Performance Fine on SQL2008 but Awful on SQL2005

I have a stored procedure that I have developed on a SQL2008 server that runs <1sec. On another server which is SQL2005 the same sp on the same database takes ~1minute. Without going into the details of the database schema can anyone see anything obvious in this SP that may cause this performance discrepancy? Could it be the use of the CTE? Is there an alternative?
EDIT - I have now noticed that if I run the SQL directly on SQL 2005 it runs in ~4secs but executing the SP still takes over a minute?? Looks like the problem may like in the SP execution??
CREATE PROCEDURE Workflow.GetTopTasks
-- Add the parameters for the stored procedure here
#ownerUserId int,
#topN int = 10
AS
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON;
SET ROWCOUNT #topN;
-- Insert statements for procedure here
WITH cteCalculatedDate (MilestoneDateId, CalculatedMilestoneDate)
AS
(
-- Anchor member definition
SELECT md.MilestoneDateId, md.SpecifiedDate
FROM Workflow.MilestoneDate md
WHERE md.RelativeMilestoneDateId IS NULL
UNION ALL
-- Recursive member definition
SELECT md.MilestoneDateId, CalculatedMilestoneDate + md.RelativeDays
FROM Workflow.MilestoneDate md
INNER JOIN cteCalculatedDate cte
on md.RelativeMilestoneDateId = cte.MilestoneDateId
)
-- Statement that executes the CTE
select
we.*
from Workflow.WorkflowElement we
left outer join cteCalculatedDate cte
on cte.MilestoneDateId = we.DueDateId
inner join Workflow.WorkflowInstance wi
on wi.WorkflowInstanceId = we.WorkflowInstanceId
left outer join Workflow.SchemeWorkflow sw
on sw.WorkflowInstanceId = wi.WorkflowInstanceId
left outer join Workflow.Scheme s
on s.SchemeId = sw.SchemeId
inner join Workflow.WorkflowDefinition wd
on wd.WorkflowDefinitionId = wi.WorkflowDefinitionId
where
we.OwnerId = #ownerUserId -- for given owner
and we.CompletedDate is null -- is not completed
and we.ElementTypeId <= 4 -- is Action, Data, Decision or Document (Not End, Start or KeyDate)
and cte.CalculatedMilestoneDate is not null -- has a duedate
UNION
select
we.*
from Workflow.WorkflowElement we
left outer join cteCalculatedDate cte
on cte.MilestoneDateId = we.DueDateId
inner join Workflow.WorkflowInstance wi
on wi.WorkflowInstanceId = we.WorkflowInstanceId
left outer join Workflow.SchemeWorkflow sw
on sw.WorkflowInstanceId = wi.WorkflowInstanceId
left outer join Workflow.Scheme s
on s.SchemeId = sw.SchemeId
inner join Workflow.WorkflowDefinition wd
on wd.WorkflowDefinitionId = wi.WorkflowDefinitionId
where
we.OwnerId = #ownerUserId -- for given owner
and we.CompletedDate is null -- is not completed
and we.ElementTypeId <= 4 -- is Action, Data, Decision or Document (Not End, Start or KeyDate)
and cte.CalculatedMilestoneDate is null -- does NOT have a duedate
SET ROWCOUNT 0
END
EDIT - I have now noticed that if I
run the SQL directly on SQL 2005 it
runs in ~4secs but executing the SP
still takes over a minute??
Bad parameter sniffing then:
http://elegantcode.com/2008/05/17/sql-parameter-sniffing-and-what-to-do-about-it/
SQL poor stored procedure execution plan performance - parameter sniffing
Parameter sniffing was bad in 2005, but better in 2008.
You union is selecting CalculatedMilestoneDate equal to NULL and not equal to Null.
This is redundant, the entire UNION can be removed by just removing the condition on CalculatedMilestoneDate from the where clause.
Other than that, you should verify that both databases have the same indexes defined.
-- Statement that executes the CTE
select
we.*
from Workflow.WorkflowElement we
left outer join cteCalculatedDate cte
on cte.MilestoneDateId = we.DueDateId
inner join Workflow.WorkflowInstance wi
on wi.WorkflowInstanceId = we.WorkflowInstanceId
left outer join Workflow.SchemeWorkflow sw
on sw.WorkflowInstanceId = wi.WorkflowInstanceId
left outer join Workflow.Scheme s
on s.SchemeId = sw.SchemeId
inner join Workflow.WorkflowDefinition wd
on wd.WorkflowDefinitionId = wi.WorkflowDefinitionId
where
we.OwnerId = #ownerUserId -- for given owner
and we.CompletedDate is null -- is not completed
and we.ElementTypeId <= 4 -- is Action, Data, Decision or Document (Not End, Start or KeyDate)
If the schemas match then perhaps you are missing important indexes in the sql server 2005 instance. Try running the sql server tuning advisors and applying its index recommendations.