Tuning Postgres queries - sql

I have a requiremnt to join two tables based upon negation condtion, which is taking much time to execute.
SELECT oola.ship_from_org_id ,
oola.subinventory,
oola.line_id ,
crl.requirement_header_id,
crl.inventory_item_id
FROM racesbi_ods.ods_csp_requirement_lines crl
LEFT JOIN racesbi_ods.ods_csp_req_line_details crld
ON crld.requirement_line_id = crl.requirement_line_id
JOIN racesbi_ods.ods_oe_order_lines_all oola
ON crld.source_id <> oola.line_id
AND oola.header_id IN
(SELECT header_id FROM racesbi_ods.ods_oe_order_lines_All
WHERE line_id = crld.source_id
)
In order to tune this I tried using temporary tables. but still I'm facing performance issue.
create temporary table tst1 --ON commit drop 244067
as(select crl.requirement_header_id,
crl.inventory_item_id,
crld.requirement_line_id,
crld.source_id FROM racesbi_ods.ods_csp_requirement_lines crl
LEFT JOIN racesbi_ods.ods_csp_req_line_details crld
ON crld.requirement_line_id = crl.requirement_line_id
) distributed randomly;
-- Query returned successfully: 244067 rows affected, 15264 ms execution time.
create temporary table tst2 --ON commit drop 2700951
as(
select ship_from_org_id,
subinventory,
line_id
FROM racesbi_ods.ods_oe_order_lines_all
) distributed randomly;
create temporary table tst3 --ON commit drop
as(
select tst1.requirement_header_id,
tst1.inventory_item_id,
tst2.ship_from_org_id,
tst2.subinventory,
tst2.line_id
FROM tst1
JOIN tst2 ON tst2.line_id != tst1.source_id
) distributed randomly;
Kindly help how to handle negation condition in JOINs

Related

Why do multiple EXISTS break a query

I am attempting to include a new table with values that need to be checked and included in a stored procedure. Statement 1 is the existing table that needs to be checked against, while statement 2 is the new table to check against.
I currently have 2 EXISTS conditions that function independently and produce the results I am expecting. By this I mean if I comment out Statement 1, statement 2 works and vice versa. When I put them together the query doesn't complete, there is no error but it times out which is unexpected because each statement only takes a few seconds.
I understand there is likely a better way to do this but before I do, I would like to know why I cannot seem to do multiple exists statements like this? Are there not meant to be multiple EXISTS conditions in the WHERE clause?
SELECT *
FROM table1 S
WHERE
--Statement 1
EXISTS
(
SELECT 1
FROM table2 P WITH (NOLOCK)
INNER JOIN table3 SA ON SA.ID = P.ID
WHERE P.DATE = #Date AND P.OTHER_ID = S.ID
AND
(
SA.FILTER = ''
OR
(
SA.FILTER = 'bar'
AND
LOWER(S.OTHER) = 'foo'
)
)
)
OR
(
--Statement 2
EXISTS
(
SELECT 1
FROM table4 P WITH (NOLOCK)
INNER JOIN table5 SA ON SA.ID = P.ID
WHERE P.DATE = #Date
AND P.OTHER_ID = S.ID
AND LOWER(S.OTHER) = 'foo'
)
)
EDIT: I have included the query details. Table 1-5 represent different tables, there are no repeated tables.
Too long to comment.
Your query as written seems correct. The timeout will only be able to be troubleshot from the execution plan, but here are a few things that could be happening or that you could benefit from.
Parameter sniffing on #Date. Try hard-coding this value and see if you still get the same slowness
No covering index on P.OTHER_ID or P.DATE or P.ID or SA.ID which would cause a table scan for these predicates
Indexes for the above columns which aren't optimal (including too many columns, etc)
Your query being serial when it may benefit from parallelism.
Using the LOWER function on a database which doesn't have a case sensitive collation (most don't, though this function doesn't slow things down that much)
You have a bad query plan in cache. Try adding OPTION (RECOMPILE) at the bottom so you get a new query plan. This is also done when comparing the speed of two queries to ensure they aren't using cached plans, or one isn't when another is which would skew the results.
Since your query is timing out, try including the estimated execution plan and post it for us at past the plan
I found putting 2 EXISTS in the WHERE condition made the whole process take significantly longer. What I found fixed it was using UNION and keeping the EXISTS in separate queries. The final result looked like the following:
SELECT *
FROM table1 S
WHERE
--Statement 1
EXISTS
(
SELECT 1
FROM table2 P WITH (NOLOCK)
INNER JOIN table3 SA ON SA.ID = P.ID
WHERE P.DATE = #Date AND P.OTHER_ID = S.ID
AND
(
SA.FILTER = ''
OR
(
SA.FILTER = 'bar'
AND
LOWER(S.OTHER) = 'foo'
)
)
)
UNION
--Statement 2
SELECT *
FROM table1 S
WHERE
EXISTS
(
SELECT 1
FROM table4 P WITH (NOLOCK)
INNER JOIN table5 SA ON SA.ID = P.ID
WHERE P.DATE = #Date
AND P.OTHER_ID = S.ID
AND LOWER(S.OTHER) = 'foo'
)

SQL How to optimize insert to table from temporary table

I created procedure where dynamically collecting from various projects (Databases) some records into temporary table and from that temporary table I am inserting into table. With WHERE statement , but unfortunately when I checked with Execution plan I find out, that this query part take a lot of load. How can I optimize this INSERT part or WHERE statement ?
INSERT INTO dbo.PROJECTS_TESTS ( PROJECTID, ANOTHERTID, DOMAINID, is_test)
SELECT * FROM #temp_Test AS tC
WHERE NOT EXISTS (SELECT TOP 1 1
FROM dbo.PROJECTS_TESTS AS ps WITH (NOLOCK)
WHERE ps.PROJECTID = tC.projectId
AND ps.ANOTHERTID = tC.anotherLink
AND ps.DOMAINID = tC.DOMAINID
AND ps.is_test = tC.test_project
)
I think you'd be better served by doing a JOIN than EXISTS. Depending on the cardinality of your join condition (currently in your WHERE) you might need DISTINCT in there too.
INSERT INTO dbo.PROJECTS_TESTS ( PROJECTID, ANOTHERTID, DOMAINID, is_test)
SELECT <maybe distinct> tC.* FROM #temp_Test AS tC
LEFT OUTER JOIN FROM dbo.PROJECTS_TESTS AS ps on
ps.PROJECTID = tC.projectId
AND ps.ANOTHERTID = tC.anotherLink
AND ps.DOMAINID = tC.DOMAINID
AND ps.is_test = tC.test_project
where ps.PROJECT ID IS NULL
or something like that

Handling sub query which retrieves large amount of data

I am using this query:
select test.* from
(
SELECT /*+ full(P) parallel(a,24) */
a.TRANS_DATE,a.STORE_NO,a.POS_NO,a.TICKET_NO,a.TICKET_START_TIME, a.INTERNAL_NO,
a.TRANS_DATE||a.STORE_NO||a.POS_NO||a.TICKET_NO||a.TICKET_START_TIME as VISITS,
s.city,s.region,
a.CUSTOMER_NO,
a.GROSS_AMT,
a.SALE_QTY,
a.DISCOUNT_AMT,
a.DISCOUNT_QTY,
S.FORMAT_CD,
S.FORMAT_DESC,
P.BRAND_ID,
P.BRAND_ID_DESC,
P.BRAND_TYPE,
P.BRAND_TYPE_DESC,
P.LEVEL1,P.LEVEL1_DESC,P.LEVEL2,
P.LEVEL2_DESC,P.LEVEL3,P.LEVEL3_DESC,P.LEVEL4,
P.LEVEL4_DESC,P.LEVEL5,P.LEVEL5_DESC,
P.Material_No,P.Medium_Desc,
b.mobile,
D.MOBILE,
ROW_NUMBER() OVER (PARTITION BY a.TRANS_DATE,a.STORE_NO,a.POS_NO,a.TICKET_NO,a.TICKET_START_TIME, a.INTERNAL_NO,
s.city,s.region,a.CUSTOMER_NO,a.GROSS_AMT,a.SALE_QTY,a.DISCOUNT_AMT,a.DISCOUNT_QTY,
S.FORMAT_CD,S.FORMAT_DESC,P.BRAND_ID,P.BRAND_ID_DESC,P.BRAND_TYPE,P.BRAND_TYPE_DESC,
P.LEVEL1,P.LEVEL1_DESC,P.LEVEL2,P.LEVEL2_DESC,P.LEVEL3,P.LEVEL3_DESC,P.LEVEL4,P.LEVEL4_DESC,
P.LEVEL5,P.LEVEL5_DESC,P.Material_No,P.Medium_Desc,b.mobile
ORDER BY D.MOBILE) as rnk
FROM P
INNER JOIN a ON a.Internal_No= P.Material_No
INNER JOIN S ON TO_CHAR(a.STORE_NO)= S.STORE_NO
LEFT OUTER JOIN b ON a.store_no=b.store_no and a.ticket_no=b.ticket_no and a.trans_date=b.trans_date
and a.ticket_start_time=b.ticket_start_time and a.pos_no=b.pos_no
LEFT OUTER JOIN d ON a.customer_no=d.customer_no
where a.TRANS_DATE between '07/04/2014' and sysdate
)test
where test.rnk=1
Problem with this query is, subquery is retriving large amount of rows and every time I run this query I am getting below error -
ORA-01652: unable to extend temp segment by 128 in tablespace TEMP1.
How can I handle this without increasing TEMP tablespace.
Best Regards

Joining on one of Two Tables Based on Parameter

Not sure if this can be done, but here is what I am trying to do.
I have two tables:
Table 1 is called Task and it contains all of the possible Task Names
Table 2 is called Task_subset and it contains only a subset of the Task Names included in Table 1
I have a variable called #TaskControl, that is passed in as a parameter, it either is equal to Table1 or Table2
Based on the value of the #TaskControl variable I want to join one of my Task Tables
For example:
If #TaskControl = 'Table1':
Select * From Orders O Join Task T on T.id = O.id
If #TaskControl = 'Table2):
Select * From Orders O Join Task_subset T on T.id = O.id
How would I do this, Sql Server 08
Don't overcomplicate it. Put it into a stored proc like so:
CREATE PROCEDURE dbo.MyProcedure(#TaskControl varchar(20))
AS
If #TaskControl = 'Table1'
Select * From Orders O Join Task T on T.id = O.id
ELSE If #TaskControl = 'Table2'
Select * From Orders O Join Task_subset T on T.id = O.id
ELSE SELECT 'Invalid Parameter'
Or just straight TSQL with no proc:
If #TaskControl = 'Table1'
Select * From Orders O Join Task T on T.id = O.id
ELSE If #TaskControl = 'Table2'
Select * From Orders O Join Task_subset T on T.id = O.id
Doing it exactly as you do it right now is the best way. Having one single statement that attempts to somehow dynamically join one of two statements is the last thing you want. T-SQL is a language for data access, not for DRY code-reuse programming. If you attempt to have a single statement then the optimizer has to come up with a plan that always work, no matter the value of #TaskControl, and so the plan will always have to join both tables.
A more lengthy discussion on this topic is Dynamic Search Conditions in T-SQL (your dynamic join falls into the same topic as dynamic search).
If they are UNION compatible you could give this a shot. From a quick test this end it only appears to access the relevant table.
I do agree more with JNK's and Remus's answers however. This does have a recompilation cost for every invocation and not much benefit.
;WITH T AS
(
SELECT 'Table1' AS TaskControl, id
FROM Task
UNION ALL
SELECT 'Table2' AS TaskControl, id
FROM Task_subset
)
SELECT *
FROM T
JOIN Orders O on T.id = O.id
WHERE TaskControl = #TaskControl
OPTION (RECOMPILE)
I don't know how good performance would be, and this would not scale well as you add on additional optional tables, but this should work in the situation that you present.
SELECT
O.some_column,
COALESCE(T.some_task_column, TS.some_task_subset_column)
FROM
Orders O
LEFT OUTER JOIN Tasks T ON
#task_control = 'Tasks' AND
T.id = O.id
LEFT OUTER JOIN Task_Subsets TS ON
#task_control = 'Task Subsets' AND
TS.id = O.id
Try the following. It should avoid the stored procedure plan getting bound based on the value of the parameter passed during the first execution of the stored procedure (See SQL Server Parameter Sniffing for details):
create proc dbo.foo
#TaskControl varchar(32)
as
declare #selection varchar(32)
set #selection = #TaskControl
select *
from dbo.Orders t
join dbo.Task t1 on t1.id = t.id
where #selection = 'Table1'
UNION ALL
select *
from dbo.Orders t
join dbo.Task_subset t1 on t1.id = t.id
where #selection = 'Table2'
return 0
go
The stored procedure shouldn't get recompiled for each invocation, either, as #Martin suggested might happen, but the parameter value 1st passed in should not influence the execution plan the gets bound. But if performance is an issue, run a sql trace with the profiler and see if the cached execution plan is reused or if a recompile is triggered.
One thing, though: you will need to ensure, though, that each individual select in the UNION returns the exact same columns. Each select in a UNION must have the same number of columns and each column must have a common type (or default conversion to the common type). The 1st select defines the number, types and names of the columns in the result set.

SQL Stored Procedure Performance Fine on SQL2008 but Awful on SQL2005

I have a stored procedure that I have developed on a SQL2008 server that runs <1sec. On another server which is SQL2005 the same sp on the same database takes ~1minute. Without going into the details of the database schema can anyone see anything obvious in this SP that may cause this performance discrepancy? Could it be the use of the CTE? Is there an alternative?
EDIT - I have now noticed that if I run the SQL directly on SQL 2005 it runs in ~4secs but executing the SP still takes over a minute?? Looks like the problem may like in the SP execution??
CREATE PROCEDURE Workflow.GetTopTasks
-- Add the parameters for the stored procedure here
#ownerUserId int,
#topN int = 10
AS
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON;
SET ROWCOUNT #topN;
-- Insert statements for procedure here
WITH cteCalculatedDate (MilestoneDateId, CalculatedMilestoneDate)
AS
(
-- Anchor member definition
SELECT md.MilestoneDateId, md.SpecifiedDate
FROM Workflow.MilestoneDate md
WHERE md.RelativeMilestoneDateId IS NULL
UNION ALL
-- Recursive member definition
SELECT md.MilestoneDateId, CalculatedMilestoneDate + md.RelativeDays
FROM Workflow.MilestoneDate md
INNER JOIN cteCalculatedDate cte
on md.RelativeMilestoneDateId = cte.MilestoneDateId
)
-- Statement that executes the CTE
select
we.*
from Workflow.WorkflowElement we
left outer join cteCalculatedDate cte
on cte.MilestoneDateId = we.DueDateId
inner join Workflow.WorkflowInstance wi
on wi.WorkflowInstanceId = we.WorkflowInstanceId
left outer join Workflow.SchemeWorkflow sw
on sw.WorkflowInstanceId = wi.WorkflowInstanceId
left outer join Workflow.Scheme s
on s.SchemeId = sw.SchemeId
inner join Workflow.WorkflowDefinition wd
on wd.WorkflowDefinitionId = wi.WorkflowDefinitionId
where
we.OwnerId = #ownerUserId -- for given owner
and we.CompletedDate is null -- is not completed
and we.ElementTypeId <= 4 -- is Action, Data, Decision or Document (Not End, Start or KeyDate)
and cte.CalculatedMilestoneDate is not null -- has a duedate
UNION
select
we.*
from Workflow.WorkflowElement we
left outer join cteCalculatedDate cte
on cte.MilestoneDateId = we.DueDateId
inner join Workflow.WorkflowInstance wi
on wi.WorkflowInstanceId = we.WorkflowInstanceId
left outer join Workflow.SchemeWorkflow sw
on sw.WorkflowInstanceId = wi.WorkflowInstanceId
left outer join Workflow.Scheme s
on s.SchemeId = sw.SchemeId
inner join Workflow.WorkflowDefinition wd
on wd.WorkflowDefinitionId = wi.WorkflowDefinitionId
where
we.OwnerId = #ownerUserId -- for given owner
and we.CompletedDate is null -- is not completed
and we.ElementTypeId <= 4 -- is Action, Data, Decision or Document (Not End, Start or KeyDate)
and cte.CalculatedMilestoneDate is null -- does NOT have a duedate
SET ROWCOUNT 0
END
EDIT - I have now noticed that if I
run the SQL directly on SQL 2005 it
runs in ~4secs but executing the SP
still takes over a minute??
Bad parameter sniffing then:
http://elegantcode.com/2008/05/17/sql-parameter-sniffing-and-what-to-do-about-it/
SQL poor stored procedure execution plan performance - parameter sniffing
Parameter sniffing was bad in 2005, but better in 2008.
You union is selecting CalculatedMilestoneDate equal to NULL and not equal to Null.
This is redundant, the entire UNION can be removed by just removing the condition on CalculatedMilestoneDate from the where clause.
Other than that, you should verify that both databases have the same indexes defined.
-- Statement that executes the CTE
select
we.*
from Workflow.WorkflowElement we
left outer join cteCalculatedDate cte
on cte.MilestoneDateId = we.DueDateId
inner join Workflow.WorkflowInstance wi
on wi.WorkflowInstanceId = we.WorkflowInstanceId
left outer join Workflow.SchemeWorkflow sw
on sw.WorkflowInstanceId = wi.WorkflowInstanceId
left outer join Workflow.Scheme s
on s.SchemeId = sw.SchemeId
inner join Workflow.WorkflowDefinition wd
on wd.WorkflowDefinitionId = wi.WorkflowDefinitionId
where
we.OwnerId = #ownerUserId -- for given owner
and we.CompletedDate is null -- is not completed
and we.ElementTypeId <= 4 -- is Action, Data, Decision or Document (Not End, Start or KeyDate)
If the schemas match then perhaps you are missing important indexes in the sql server 2005 instance. Try running the sql server tuning advisors and applying its index recommendations.