I need help in improving performance of a SQL query. I have a tblThings and tblTexts. tblThings has around 10 TextIDs. The query I have left joins on these tables 10 times which is quite slow.
Here is a simplified version of my query:
SELECT th.ThingID, th.DescTextID, th.ColourTextID, tDesc.Text AS Desc, tCol.Text AS Colour
FROM tblThings th
LEFT JOIN tblTexts tDesc ON tDesc.TextID = th.DescTextID
LEFT JOIN tblTexts tCol ON tCol.TextID = th.ColourTextID
--and so on around 10 times
I would appreciate it the help.
There is nothing wrong with the query. As you must look up text in a text table (for internationalization reasons probably), you are forced to write the query thus.
There should be an index on tblTexts(TextID) of course.
Related
I'm working on an oracle query that is doing a select on a huge table, however the joins with other tables seem to be costing a lot in terms of time of processing.
I'm looking for tips on how to improve the working of this query.
I'm attaching a version of the query and the explain plan of it.
Query
SELECT
l.gl_date,
l.REST_OF_TABLES
(
SELECT
MAX(tt.task_id)
FROM
bbb.jeg_pa_tasks tt
WHERE
l.project_id = tt.project_id
AND l.task_number = tt.task_number
) task_id
FROM
aaa.jeg_labor_history l,
bbb.jeg_pa_projects_all p
WHERE
p.org_id = 2165
AND l.project_id = p.project_id
AND p.project_status_code = '1000'
Something to mention:
This query takes data from oracle to send it to a sql server database, so I need it to be this big, I can't narrow the scope of the query.
the purpose is to set it to a sql server job with SSIS so it runs periodically
One obvious suggestion is not to use sub query in select clause.
Instead, you can try to join the tables.
SELECT
l.gl_date,
l.REST_OF_TABLES
t.task_id
FROM
aaa.jeg_labor_history l
Join bbb.jeg_pa_projects_all p
On (l.project_id = p.project_id)
Left join (SELECT
tt.project_id,
tt.task_number,
MAX(tt.task_id) task_id
FROM
bbb.jeg_pa_tasks tt
Group by tt.project_id, tt.task_number) t
On (l.project_id = t.project_id
AND l.task_number = t.task_number)
WHERE
p.org_id = 2165
AND p.project_status_code = '1000';
Cheers!!
As I don't know exactly how many rows this query is returning or how many rows this table/view has.
I can provide you few simple tips which might be helpful for you for better query performance:
Check Indexes. There should be indexes on all fields used in the WHERE and JOIN portions of the SQL statement.
Limit the size of your working data set.
Only select columns you need.
Remove unnecessary tables.
Remove calculated columns in JOIN and WHERE clauses.
Use inner join, instead of outer join if possible.
You view contains lot of data so you can also break down and limit only the information you need from this view
I was just tidying up some sql when I came across this query:
SELECT
jm.IMEI ,
jm.MaxSpeedKM ,
jm.MaxAccel ,
jm.MaxDeccel ,
jm.JourneyMaxLeft ,
jm.JourneyMaxRight ,
jm.DistanceKM ,
jm.IdleTimeSeconds ,
jm.WebUserJourneyId ,
jm.lifetime_odo_metres ,
jm.[Descriptor]
FROM dbo.Reporting_WebUsers AS wu WITH (NOLOCK)
INNER JOIN dbo.Reporting_JourneyMaster90 AS jm WITH (NOLOCK) ON wu.WebUsersId = jm.WebUsersId
INNER JOIN dbo.Reporting_Journeys AS j WITH (NOLOCK) ON jm.WebUserJourneyId = j.WebUserJourneyId
WHERE ( wu.isActive = 1 )
AND ( j.JourneyDuration > 2 )
AND ( j.JourneyDuration < 1000 )
AND ( j.JourneyDistance > 0 )
My question is does it make any performance difference the order of the joins as for the above query I would have done
FROM dbo.Reporting_JourneyMaster90 AS jm
and then joined the other 2 tables to that one
Join order in SQL2008R2 server does unquestionably affect query performance, particularly in queries where there are a large number of table joins with where clauses applied against multiple tables.
Although the join order is changed in optimisation, the optimiser does't try all possible join orders. It stops when it finds what it considers a workable solution as the very act of optimisation uses precious resources.
We have seen queries that were performing like dogs (1min + execution time) come down to sub second performance just by changing the order of the join expressions. Please note however that these are queries with 12 to 20 joins and where clauses on several of the tables.
The trick is to set your order to help the query optimiser figure out what makes sense. You can use Force Order but that can be too rigid. Try to make sure that your join order starts with the tables where the will reduce data most through where clauses.
No, the JOIN by order is changed during optimization.
The only caveat is the Option FORCE ORDER which will force joins to happen in the exact order you have them specified.
I have a clear example of inner join affecting performance. It is a simple join between two tables. One had 50+ million records, the other has 2,000. If I select from the smaller table and join the larger it takes 5+ minutes.
If I select from the larger table and join the smaller it takes 2 min 30 seconds.
This is with SQL Server 2012.
To me this is counter intuitive since I am using the largest dataset for the initial query.
Usually not. I'm not 100% this applies verbatim to Sql-Server, but in Postgres the query planner reserves the right to reorder the inner joins as it sees fit. The exception is when you reach a threshold beyond which it's too expensive to investigate changing their order.
JOIN order doesn't matter, the query engine will reorganize their order based on statistics for indexes and other stuff.
For test do the following:
select show actual execution plan and run first query
change JOIN order and now run the query again
compare execution plans
They should be identical as the query engine will reorganize them according to other factors.
As commented on other asnwer, you could use OPTION (FORCE ORDER) to use exactly the order you want but maybe it would not be the most efficient one.
AS a general rule of thumb, JOIN order should be with table of least records on top, and most records last, as some DBMS engines the order can make a difference, as well as if the FORCE ORDER command was used to help limit the results.
Wrong. SQL Server 2005 it definitely matters since you are limiting the dataset from the beginning of the FROM clause. If you start with 2000 records instead of 2 million it makes your query faster.
I was just tidying up some sql when I came across this query:
SELECT
jm.IMEI ,
jm.MaxSpeedKM ,
jm.MaxAccel ,
jm.MaxDeccel ,
jm.JourneyMaxLeft ,
jm.JourneyMaxRight ,
jm.DistanceKM ,
jm.IdleTimeSeconds ,
jm.WebUserJourneyId ,
jm.lifetime_odo_metres ,
jm.[Descriptor]
FROM dbo.Reporting_WebUsers AS wu WITH (NOLOCK)
INNER JOIN dbo.Reporting_JourneyMaster90 AS jm WITH (NOLOCK) ON wu.WebUsersId = jm.WebUsersId
INNER JOIN dbo.Reporting_Journeys AS j WITH (NOLOCK) ON jm.WebUserJourneyId = j.WebUserJourneyId
WHERE ( wu.isActive = 1 )
AND ( j.JourneyDuration > 2 )
AND ( j.JourneyDuration < 1000 )
AND ( j.JourneyDistance > 0 )
My question is does it make any performance difference the order of the joins as for the above query I would have done
FROM dbo.Reporting_JourneyMaster90 AS jm
and then joined the other 2 tables to that one
Join order in SQL2008R2 server does unquestionably affect query performance, particularly in queries where there are a large number of table joins with where clauses applied against multiple tables.
Although the join order is changed in optimisation, the optimiser does't try all possible join orders. It stops when it finds what it considers a workable solution as the very act of optimisation uses precious resources.
We have seen queries that were performing like dogs (1min + execution time) come down to sub second performance just by changing the order of the join expressions. Please note however that these are queries with 12 to 20 joins and where clauses on several of the tables.
The trick is to set your order to help the query optimiser figure out what makes sense. You can use Force Order but that can be too rigid. Try to make sure that your join order starts with the tables where the will reduce data most through where clauses.
No, the JOIN by order is changed during optimization.
The only caveat is the Option FORCE ORDER which will force joins to happen in the exact order you have them specified.
I have a clear example of inner join affecting performance. It is a simple join between two tables. One had 50+ million records, the other has 2,000. If I select from the smaller table and join the larger it takes 5+ minutes.
If I select from the larger table and join the smaller it takes 2 min 30 seconds.
This is with SQL Server 2012.
To me this is counter intuitive since I am using the largest dataset for the initial query.
Usually not. I'm not 100% this applies verbatim to Sql-Server, but in Postgres the query planner reserves the right to reorder the inner joins as it sees fit. The exception is when you reach a threshold beyond which it's too expensive to investigate changing their order.
JOIN order doesn't matter, the query engine will reorganize their order based on statistics for indexes and other stuff.
For test do the following:
select show actual execution plan and run first query
change JOIN order and now run the query again
compare execution plans
They should be identical as the query engine will reorganize them according to other factors.
As commented on other asnwer, you could use OPTION (FORCE ORDER) to use exactly the order you want but maybe it would not be the most efficient one.
AS a general rule of thumb, JOIN order should be with table of least records on top, and most records last, as some DBMS engines the order can make a difference, as well as if the FORCE ORDER command was used to help limit the results.
Wrong. SQL Server 2005 it definitely matters since you are limiting the dataset from the beginning of the FROM clause. If you start with 2000 records instead of 2 million it makes your query faster.
I've been toying around with switching from ms-access files to SQLite files for my simple database needs; for the usual reasons: smaller file size, less overhead, open source, etc.
One thing that is preventing me from making the switch is what seems to be a lack of speed in SQLite. For simple SELECT queries, SQLite seems to perform as well as, or better than MS-Access. The problem occurs with a fairly complex SELECT query with multiple INNER JOIN statements:
SELECT DISTINCT
DESCRIPTIONS.[oCode] AS OptionCode,
DESCRIPTIONS.[descShort] AS OptionDescription
FROM DESCRIPTIONS
INNER JOIN tbl_D_E ON DESCRIPTIONS.[oCode] = tbl_D_E.[D]
INNER JOIN tbl_D_F ON DESCRIPTIONS.[oCode] = tbl_D_F.[D]
INNER JOIN tbl_D_H ON DESCRIPTIONS.[oCode] = tbl_D_H.[D]
INNER JOIN tbl_D_J ON DESCRIPTIONS.[oCode] = tbl_D_J.[D]
INNER JOIN tbl_D_T ON DESCRIPTIONS.[oCode] = tbl_D_T.[D]
INNER JOIN tbl_Y_D ON DESCRIPTIONS.[oCode] = tbl_Y_D.[D]
WHERE ((tbl_D_E.[E] LIKE '%')
AND (tbl_D_H.[oType] ='STANDARD')
AND (tbl_D_J.[oType] ='STANDARD')
AND (tbl_Y_D.[Y] = '41')
AND (tbl_Y_D.[oType] ='STANDARD')
AND (DESCRIPTIONS.[oMod]='D'))
In MS-Access, this query executes in about 2.5 seconds. In SQLite, it takes a little over 8 minutes. It takes the same amount of time whether I'm running the query from VB code or from the command prompt using sqlite3.exe.
So my questions are the following:
Is SQLite just not optimized to handle multiple INNER JOIN statements?
Have I done something obviously stupid in my query (because I am new to SQLite) that makes it so slow?
And before anyone suggests a completely different technology, no I can not switch. My choices are MS-Access or SQLite. :)
UPDATE:
Assigning an INDEX to each of the columns in the SQLite database reduced the query time from over 8 minutes down to about 6 seconds. Thanks to Larry Lustig for explaining why the INDEXing was needed.
As requested, I'm reposting my previous comment as an actual answer (when I first posted the comment I was not able, for some reason, to post it as an answer):
MS Access is very aggressive about indexing columns on your behalf, whereas SQLite will require you to explicitly create the indexes you need. So, it's possible that Access has indexed either [Description] or [D] for you but that those indexes are missing in SQLite. I don't have experience with that amount of JOIN activity in SQLite. I used it in one Django project with a relatively small amount of data and did not detect any performance issues.
Do you have issues with referencial integrity? I ask because have the impression you've got unnecessary joins, so I re-wrote your query as:
SELECT DISTINCT
t.[oCode] AS OptionCode,
t.[descShort] AS OptionDescription
FROM DESCRIPTIONS t
JOIN tbl_D_H h ON h.[D] = t.[oCode]
AND h.[oType] = 'STANDARD'
JOIN tbl_D_J j ON j.[D] = t.[oCode]
AND j.[oType] = 'STANDARD'
JOIN tbl_Y_D d ON d.[D] = t.[oCode]
AND d.[Y] = '41'
AND d.[oType] ='STANDARD'
WHERE t.[oMod] = 'D'
If DESCRIPTIONS and tbl_D_E have multiple row scans then oCode and D should be indexed. Look at example here to see how to index and tell how many row scans there are (http://www.siteconsortium.com/h/p1.php?id=mysql002).
This might fix it though ..
CREATE INDEX ocode_index ON DESCRIPTIONS (oCode) USING BTREE;
CREATE INDEX d_index ON tbl_D_E (D) USING BTREE;
etc ....
Indexing correctly is one piece of the puzzle that can easily double, triple or more the speed of the query.
I am currently working with a query in in MSSQL that looks like:
SELECT
...
FROM
(SELECT
...
)T1
JOIN
(SELECT
...
)T2
GROUP BY
...
The inner selects are relatively fast, but the outer select aggregates the inner selects and takes an incredibly long time to execute, often timing out. Removing the group by makes it run somewhat faster and changing the join to a LEFT OUTER JOIN speeds things up a bit as well.
Why would doing a group by on a select which aggregates two inner selects cause the query to run so slow? Why does an INNER JOIN run slower than a LEFT OUTER JOIN? What can I do to troubleshoot this further?
EDIT: What makes this even more perplexing is the two inner queries are date limited and the overall query only runs slow when looking at date ranges between the start of July and any other day in July, but if the date ranges are anytime before the the July 1 and Today then it runs fine.
Without some more detail of your query its impossible to offer any hints as to what may speed your query up. A possible guess is the two inner queries are blocking access to any indexes which might have been used to perform the join resulting in large scans but there are probably many other possible reasons.
To check where the time is used in the query check the execution plan, there is a detailed explanation here
http://www.sql-server-performance.com/tips/query_execution_plan_analysis_p1.aspx
The basic run down is run the query, and display the execution plan, then look for any large percentages - they are what is slowing your query down.
Try rewriting your query without the nested SELECTs, which are rarely necessary. When using nested SELECTs - except for trivial cases - the inner SELECT resultsets are not indexed, which makes joining them to anything slow.
As Tetraneutron said, post details of your query -- we may help you rewrite it in a straight-through way.
Have you given a join predicate? Ie join table A ON table.ColA = table.ColB. If you don't give a predicate then SQL may be forced to use nested loops, so if you have a lot of rows in that range it would explain a query slow down.
Have a look at the plan in the SQL studio if you have MS Sql Server to play with.
After your t2 statement add a join condition on t1.joinfield = t2.joinfield
The issue was with fragmented data. After the data was defragmented the query started running within reasonable time constraints.
JOIN = Cartesian Product. All columns from both tables will be joined in numerous permutations. It is slow because the inner queries are querying each of the separate tables, but once they hit the join, it becomes a Cartesian product and is more difficult to manage. This would occur at the outer select statement.
Have a look at INNER JOINs as Tetraneutron recommended.