SQL How to optimize insert to table from temporary table - sql

I created procedure where dynamically collecting from various projects (Databases) some records into temporary table and from that temporary table I am inserting into table. With WHERE statement , but unfortunately when I checked with Execution plan I find out, that this query part take a lot of load. How can I optimize this INSERT part or WHERE statement ?
INSERT INTO dbo.PROJECTS_TESTS ( PROJECTID, ANOTHERTID, DOMAINID, is_test)
SELECT * FROM #temp_Test AS tC
WHERE NOT EXISTS (SELECT TOP 1 1
FROM dbo.PROJECTS_TESTS AS ps WITH (NOLOCK)
WHERE ps.PROJECTID = tC.projectId
AND ps.ANOTHERTID = tC.anotherLink
AND ps.DOMAINID = tC.DOMAINID
AND ps.is_test = tC.test_project
)

I think you'd be better served by doing a JOIN than EXISTS. Depending on the cardinality of your join condition (currently in your WHERE) you might need DISTINCT in there too.
INSERT INTO dbo.PROJECTS_TESTS ( PROJECTID, ANOTHERTID, DOMAINID, is_test)
SELECT <maybe distinct> tC.* FROM #temp_Test AS tC
LEFT OUTER JOIN FROM dbo.PROJECTS_TESTS AS ps on
ps.PROJECTID = tC.projectId
AND ps.ANOTHERTID = tC.anotherLink
AND ps.DOMAINID = tC.DOMAINID
AND ps.is_test = tC.test_project
where ps.PROJECT ID IS NULL
or something like that

Related

Why do multiple EXISTS break a query

I am attempting to include a new table with values that need to be checked and included in a stored procedure. Statement 1 is the existing table that needs to be checked against, while statement 2 is the new table to check against.
I currently have 2 EXISTS conditions that function independently and produce the results I am expecting. By this I mean if I comment out Statement 1, statement 2 works and vice versa. When I put them together the query doesn't complete, there is no error but it times out which is unexpected because each statement only takes a few seconds.
I understand there is likely a better way to do this but before I do, I would like to know why I cannot seem to do multiple exists statements like this? Are there not meant to be multiple EXISTS conditions in the WHERE clause?
SELECT *
FROM table1 S
WHERE
--Statement 1
EXISTS
(
SELECT 1
FROM table2 P WITH (NOLOCK)
INNER JOIN table3 SA ON SA.ID = P.ID
WHERE P.DATE = #Date AND P.OTHER_ID = S.ID
AND
(
SA.FILTER = ''
OR
(
SA.FILTER = 'bar'
AND
LOWER(S.OTHER) = 'foo'
)
)
)
OR
(
--Statement 2
EXISTS
(
SELECT 1
FROM table4 P WITH (NOLOCK)
INNER JOIN table5 SA ON SA.ID = P.ID
WHERE P.DATE = #Date
AND P.OTHER_ID = S.ID
AND LOWER(S.OTHER) = 'foo'
)
)
EDIT: I have included the query details. Table 1-5 represent different tables, there are no repeated tables.
Too long to comment.
Your query as written seems correct. The timeout will only be able to be troubleshot from the execution plan, but here are a few things that could be happening or that you could benefit from.
Parameter sniffing on #Date. Try hard-coding this value and see if you still get the same slowness
No covering index on P.OTHER_ID or P.DATE or P.ID or SA.ID which would cause a table scan for these predicates
Indexes for the above columns which aren't optimal (including too many columns, etc)
Your query being serial when it may benefit from parallelism.
Using the LOWER function on a database which doesn't have a case sensitive collation (most don't, though this function doesn't slow things down that much)
You have a bad query plan in cache. Try adding OPTION (RECOMPILE) at the bottom so you get a new query plan. This is also done when comparing the speed of two queries to ensure they aren't using cached plans, or one isn't when another is which would skew the results.
Since your query is timing out, try including the estimated execution plan and post it for us at past the plan
I found putting 2 EXISTS in the WHERE condition made the whole process take significantly longer. What I found fixed it was using UNION and keeping the EXISTS in separate queries. The final result looked like the following:
SELECT *
FROM table1 S
WHERE
--Statement 1
EXISTS
(
SELECT 1
FROM table2 P WITH (NOLOCK)
INNER JOIN table3 SA ON SA.ID = P.ID
WHERE P.DATE = #Date AND P.OTHER_ID = S.ID
AND
(
SA.FILTER = ''
OR
(
SA.FILTER = 'bar'
AND
LOWER(S.OTHER) = 'foo'
)
)
)
UNION
--Statement 2
SELECT *
FROM table1 S
WHERE
EXISTS
(
SELECT 1
FROM table4 P WITH (NOLOCK)
INNER JOIN table5 SA ON SA.ID = P.ID
WHERE P.DATE = #Date
AND P.OTHER_ID = S.ID
AND LOWER(S.OTHER) = 'foo'
)

Tuning Postgres queries

I have a requiremnt to join two tables based upon negation condtion, which is taking much time to execute.
SELECT oola.ship_from_org_id ,
oola.subinventory,
oola.line_id ,
crl.requirement_header_id,
crl.inventory_item_id
FROM racesbi_ods.ods_csp_requirement_lines crl
LEFT JOIN racesbi_ods.ods_csp_req_line_details crld
ON crld.requirement_line_id = crl.requirement_line_id
JOIN racesbi_ods.ods_oe_order_lines_all oola
ON crld.source_id <> oola.line_id
AND oola.header_id IN
(SELECT header_id FROM racesbi_ods.ods_oe_order_lines_All
WHERE line_id = crld.source_id
)
In order to tune this I tried using temporary tables. but still I'm facing performance issue.
create temporary table tst1 --ON commit drop 244067
as(select crl.requirement_header_id,
crl.inventory_item_id,
crld.requirement_line_id,
crld.source_id FROM racesbi_ods.ods_csp_requirement_lines crl
LEFT JOIN racesbi_ods.ods_csp_req_line_details crld
ON crld.requirement_line_id = crl.requirement_line_id
) distributed randomly;
-- Query returned successfully: 244067 rows affected, 15264 ms execution time.
create temporary table tst2 --ON commit drop 2700951
as(
select ship_from_org_id,
subinventory,
line_id
FROM racesbi_ods.ods_oe_order_lines_all
) distributed randomly;
create temporary table tst3 --ON commit drop
as(
select tst1.requirement_header_id,
tst1.inventory_item_id,
tst2.ship_from_org_id,
tst2.subinventory,
tst2.line_id
FROM tst1
JOIN tst2 ON tst2.line_id != tst1.source_id
) distributed randomly;
Kindly help how to handle negation condition in JOINs

SQL Creating a temp table for this Query

I want to create a temp table to store this query so that I can then update another table.
SELECT SaginawUtilityData.ServiceFrom, SaginawUtilityData.ServiceThru,
BillingMonth = dbo.fn_MonthWithMostDaysInRange(ServiceFrom, ServiceThru),
SaginawUtilityData.Usage, SaginawUtilityData.UtilityCharges
FROM SaginawUtilityData
JOIN tblMEP_CustomerAccounts
ON SaginawUtilityData.AccountNumber = tblMEP_CustomerAccounts.AccountNumber
JOIN tblMEP_Customers
ON tblMEP_CustomerAccounts.CustomerID = tblMEP_Customers.ID
JOIN tblMEP_UtilityCompanies
ON tblMEP_UtilityCompanies.ID = tblMEP_CustomerAccounts.UtilityCompanyID
JOIN tblMEP_Meters
ON tblMEP_CustomerAccounts.ID = tblMEP_Meters.CustomerAccountID
WHERE tblMEP_Customers.ID = 43
It would be great if you can just do the update: this is the other table and the columns I want to insert from the table above into this one:
SELECT tblMEP_MonthlyDATA.CycleStartDate, tblMEP_MonthlyDATA.CycleEndDate,
tblMEP_MonthlyDATA.BillingMonth, tblMEP_MonthlyDATA.Consumption,
tblMEP_MonthlyDATA.Charge
FROM tblMEP_MonthlyData
Thanks.
I'm uncertain whether I really understand what you're asking but I think you're asking how you can put the data from the top query straight into the table columns shown in the bottom query, so:
INSERT tblMEP_MonthlyDATA (
CycleStartDate, CycleEndDate, BillingMonth, Consumption, Charge)
SELECT
SaginawUtilityData.ServiceFrom,
SaginawUtilityData.ServiceThru,
dbo.fn_MonthWithMostDaysInRange(ServiceFrom, ServiceThru),
SaginawUtilityData.Usage,
SaginawUtilityData.UtilityCharges
FROM SaginawUtilityData
JOIN tblMEP_CustomerAccounts ON SaginawUtilityData.AccountNumber =
tblMEP_CustomerAccounts.AccountNumber
JOIN tblMEP_Customers ON tblMEP_CustomerAccounts.CustomerID = tblMEP_Customers.ID
JOIN tblMEP_UtilityCompanies ON tblMEP_UtilityCompanies.ID =
tblMEP_CustomerAccounts.UtilityCompanyID
JOIN tblMEP_Meters ON tblMEP_CustomerAccounts.ID = tblMEP_Meters.CustomerAccountID
WHERE
tblMEP_Customers.ID = 43 --or other set of conditions
Hope this helps (or is at least vaguely relevant to what you want to know)

SQL Server 2008 R2 query

I'm running the following query, but it is taking too long. Is there a way to make it faster or change the way the query is written?
Please help.
SELECT *
FROM ProductGroupLocUpdate WITH (nolock)
WHERE CmStatusFlag > 2
AND EngineID IN ( 0, 1 )
AND NOT EXISTS (SELECT DISTINCT APGV.LocationID
FROM CM_ST_ActiveProductGroupsView AS APGV WITH(nolock)
WHERE APGV.LocationID = ProductGroupLocUpdate.Locationid);
Try rewriting the query with a join
SELECT PGLU.* from ProductGroupLocUpdate PGLU WITH (NOLOCK)
LEFT JOIN CM_ST_ActiveProductGroupsView APGV WITH (NOLOCK)
ON PGLU.LocationId = APGV.LocationID
WHERE APGV.LocationID IS NULL AND CmStatusFlag>2 AND EngineID IN (0,1)
Depending on how much data is in your table, check add indexes to LocationId (in both tables), CmStatusFlag and EngineID

Joining on one of Two Tables Based on Parameter

Not sure if this can be done, but here is what I am trying to do.
I have two tables:
Table 1 is called Task and it contains all of the possible Task Names
Table 2 is called Task_subset and it contains only a subset of the Task Names included in Table 1
I have a variable called #TaskControl, that is passed in as a parameter, it either is equal to Table1 or Table2
Based on the value of the #TaskControl variable I want to join one of my Task Tables
For example:
If #TaskControl = 'Table1':
Select * From Orders O Join Task T on T.id = O.id
If #TaskControl = 'Table2):
Select * From Orders O Join Task_subset T on T.id = O.id
How would I do this, Sql Server 08
Don't overcomplicate it. Put it into a stored proc like so:
CREATE PROCEDURE dbo.MyProcedure(#TaskControl varchar(20))
AS
If #TaskControl = 'Table1'
Select * From Orders O Join Task T on T.id = O.id
ELSE If #TaskControl = 'Table2'
Select * From Orders O Join Task_subset T on T.id = O.id
ELSE SELECT 'Invalid Parameter'
Or just straight TSQL with no proc:
If #TaskControl = 'Table1'
Select * From Orders O Join Task T on T.id = O.id
ELSE If #TaskControl = 'Table2'
Select * From Orders O Join Task_subset T on T.id = O.id
Doing it exactly as you do it right now is the best way. Having one single statement that attempts to somehow dynamically join one of two statements is the last thing you want. T-SQL is a language for data access, not for DRY code-reuse programming. If you attempt to have a single statement then the optimizer has to come up with a plan that always work, no matter the value of #TaskControl, and so the plan will always have to join both tables.
A more lengthy discussion on this topic is Dynamic Search Conditions in T-SQL (your dynamic join falls into the same topic as dynamic search).
If they are UNION compatible you could give this a shot. From a quick test this end it only appears to access the relevant table.
I do agree more with JNK's and Remus's answers however. This does have a recompilation cost for every invocation and not much benefit.
;WITH T AS
(
SELECT 'Table1' AS TaskControl, id
FROM Task
UNION ALL
SELECT 'Table2' AS TaskControl, id
FROM Task_subset
)
SELECT *
FROM T
JOIN Orders O on T.id = O.id
WHERE TaskControl = #TaskControl
OPTION (RECOMPILE)
I don't know how good performance would be, and this would not scale well as you add on additional optional tables, but this should work in the situation that you present.
SELECT
O.some_column,
COALESCE(T.some_task_column, TS.some_task_subset_column)
FROM
Orders O
LEFT OUTER JOIN Tasks T ON
#task_control = 'Tasks' AND
T.id = O.id
LEFT OUTER JOIN Task_Subsets TS ON
#task_control = 'Task Subsets' AND
TS.id = O.id
Try the following. It should avoid the stored procedure plan getting bound based on the value of the parameter passed during the first execution of the stored procedure (See SQL Server Parameter Sniffing for details):
create proc dbo.foo
#TaskControl varchar(32)
as
declare #selection varchar(32)
set #selection = #TaskControl
select *
from dbo.Orders t
join dbo.Task t1 on t1.id = t.id
where #selection = 'Table1'
UNION ALL
select *
from dbo.Orders t
join dbo.Task_subset t1 on t1.id = t.id
where #selection = 'Table2'
return 0
go
The stored procedure shouldn't get recompiled for each invocation, either, as #Martin suggested might happen, but the parameter value 1st passed in should not influence the execution plan the gets bound. But if performance is an issue, run a sql trace with the profiler and see if the cached execution plan is reused or if a recompile is triggered.
One thing, though: you will need to ensure, though, that each individual select in the UNION returns the exact same columns. Each select in a UNION must have the same number of columns and each column must have a common type (or default conversion to the common type). The 1st select defines the number, types and names of the columns in the result set.